arXiv: 1506.02984v2 [math.ST] 3 Nov 2016 


Parameter stability and semiparametric inference in time-varying 

ARCH models 


Lionel Truquet 


Abstract 


In this paper, we develop a complete methodology for detecting time-varying/non time-varying pa¬ 
rameters in ARCH processes. For this purpose, we estimate and test various semiparametric versions of 
the time-varying ARCH model (tv-ARCH) which include two well known non stationary ARCH type 
models introduced in the econometric literature. Using kernel estimation, we show that non time-varying 
parameters can be estimated at the usual parametric rate of convergence and for a Gaussian noise, we 
construct estimates that are asymptotically efficient in a semiparametric sense. Then we introduce two 
statistical tests which can be used for detecting non time-varying parameters or for testing the second 
order dynamic. An information criterion for selecting the number of lags is also provided. We illustrate 
our methodology with several real data sets. 

1 Introduction 

The modeling of financial data using nonstationary time series has recently received considerable attention 
both in econometrics and in statistics. For classical daily series such as stock market indices or currency 
exchange rates, the stationarity assumption seems often incompatible with a long history of data and the 
necessity of using non stationary ARCH models has been pointed out by several authors. See for instance 
Mikosch and Starica (2004), Granger and Starica (2005), Engle and Rangel (2008), Fryzlewicz et al. (2008) 
and the references therein. However, it is difficult to find in the literature a consensus for representing non¬ 
stationary ARCH models. A natural approach is to allow time-varying parameters in the classical ARCH 
model of Engle (1982). Such an extension has been proposed by Dahlhaus and Subba Rao (2006) with 
the so-called time-varying ARCH model (tv-ARCH). The tv-ARCH processes are defined by the recursive 
equations 



p + I < t <T, 


( 1 . 1 ) 


where for 0 < y < p, is a smooth function and ^ a strong white noise with variance 1. Since they can be 
locally approximated by stationary ARCH processes, the tv-ARCH processes are called locally stationary 
(the notion of local stationarity is introduced in Dahlhaus (1997) for linear processes but the meaning of 
local stationarity for the non linear tv-ARCH can be found in Dahlhaus and Subba Rao (2006)). Erom this 
important feature, a nice asymptotic theory can be developed for estimation of parameters, in particular 
local inference methods such as the local Quasi-maximum likelihood estimation studied in Dahlhaus and 
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Subba Rao (2006), the local weighted least-squares estimation developed in Fryzlewicz et al. (2008) or the 
recursive online algorithms considered by Dahlhaus and Subba Rao (2007). In Fryzlewicz et al. (2008), 
it was shown that tv-ARCH processes provide good fits and accurate forecasts for some financial series. 
However, stafisfical inference in model (1.1) is complex, even for large samples, because p + 1 funclions 
have fo be esfimafed using nonparamefric mefhods. Thus, in practice, reducing complexify can be inferesfing 
fo improve model fif or forecasls accuracy. For example. Granger and Sfarica (2005) have shown fhaf fhe 
simple model 


Xt = (T 



I <t<T, 


(1.2) 


wifh a smoofh deferminisfic function cr : [0,1] —> 1.+ can already produce significanfly heller forecasls for 
fhe relurns of fhe SP&500 index fhan fhe classical GARCH(1,1) model. A process of fype (1.2) can be seen 
as a Iv-ARCH process wifh zero lag coefficienls. Nofe lhat model (1.2) does nol assume autocorrelation for 
fhe absolule values or fhe squares of fhe process (a correlation property often called second order dynamic 
in the literature) but only some changes in the unconditional variance. In Granger and Starica (2005), it is 
argued that most of dynamic of the S&P index can be explained with a time-varying unconditional variance. 

But nonstationarity and second order correlation can also be combined in a very simple way, assuming 
constant lag coefficients in (1.1): 


Xt = ^t 



(1.3) 


Model (1.3) combine a time-varying unconditional variance compatible with the analysis of Granger and 
Starica (2005) and a second order dynamic for the series with a single nonparametric component. Note also 
that a process {Xt)t defined by equations (1.3) can be written using the multiplicative form 


A,- 



(1.4) 


where 






1 -t 




^0 ^ ^ 
^0 (t) 




A 


;=i 


if we neglect the ratio ao /ao (f ) which is of order 1 -i- l/T when the function ao is positive and Lipschitz 
continuous over [0,1]. One can also notice that writing the model with the latter approximation or not 
lead to two processes that are both approximated by the same stationary ARCH processes with parameters 
ao{u),ai,... ,ap (see Lemma 1 for this kind of approximation). In the stationary case, we remind that 
multiplying an ARCH process by a positive constant is equivalent to multiply the initial intercept coefficient. 
Then for a large sample size T, the process {Yt)t behaves as a stationary ARCH process and ao(-) is (up to 
a constant) the time-varying unconditional variance of the process Such a multiplicative form for 

ARCH models has been first considered by Engle and Rangel (2008) with the so-called Spline-GARCH 
model which writes as model (1.4) but with a GARCH(1,1) process (T?)/. 

Since the previous models satisfy the inclusions (1.2)c (1.3)c (1.1), a natural question for any real data 
set is to test some properties of the lag coefficients. Testing the constancy of the lag coefficients can help 
to decide between model (1.3) and model (1.1) while testing the second order dynamic in model (1.3) is 
useful to determine if model (1.2) provides a sufficient fit. Statistical tools to help the practitioners to choose 
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among the three important specifications described above seems not available in the literature except in a 
recent paper of Patilea and Raissi (2014) which introduces a test for the second order dynamic in model 
(1.3). 

In this paper, we propose a general approach for estimating an arbitrary subset of non time-varying 
coefficients in tv-ARCH processes. Our estimators are VT-consistent and we will also study the semi- 
parametric asymptotic efficiency of our method when the noise is Gaussian. Using these results, we construct 
two statistical tests. The first test can be used to decide whether a given subset of parameters is time-varying 
or not. This test is based on a 1? distance between a nonparametric kernel estimator of the coefficients 
and the semiparametric estimator introduced in this paper. The second test can be used for deciding if the 
constant parameters are different from zero. Various applications can be considered as a simple particular 
case of our methodology: testing model (1.3) versus model (1.1), testing model (1.2) versus model (1.3), 
estimating parameters and selecting lag variables in models (1.3) or (1.1). When some coefficients are 
assumed to be non time-varying in (1.1), the decomposition Xj = cr^ + - 1^ cr^ leads to semiparametric 

inference in a time-varying regression model. Detecting and estimating a parametric component in general 
time-varying regression models has been considered recently by Zhang and Wu (2012). However these 
authors do not consider the case of tv-ARCH processes with optimal moment condition for the marginal 
distribution, asymptotic semi-parametric efficiency for the estimation and Lipschitz continuity for the time- 
varying coefficients. Moreover, our approach for estimating non-time varying coefficients is quite different. 

We applied our methodology to three real data sets: the daily exchange rates between the US Dollar 
and the Euro or between the US Dollar and the Indian Rupee and the FTSE index. Eor the three series 
of interest, a non time-varying intercept is clearly rejected over the considered period. The conclusion for 
the lag coefficients depends on the series. In fitting model (1.3), we also found that incorporating non 
stationarity reduces the values of lag parameters with respect to the stationary case. Then the time-varying 
unconditional variance has an important contribution to volatility. 

The paper is organized as follows. In Section 2, we introduce our notations and we describe the basis 
of our method for statistical inference in a tv-ARCH model for which some coefficients are assumed to 
be non time-varying. In Section 3, we give the asymptotic results for our estimators and we discuss the 
problem of semiparametric asymptotic efficiency using the EAN theory. Statistical testing and their practical 
implementation are considered in Section 4 and Section 5 is devoted to real data applications. All the proofs 
of our results are postponed to the supplementary material which also contains many simulation studies 
showing the good behavior of our methodology. The Matlab codes and the data sets discussed in Section 5 
are available at the URE 

https://github.com/time-varying/tests-and-estimation-for-tv-ARCH- 


2 Semiparametric volatility and tv-ARCH processes 

2.1 Formulation and notations 

In this section, we consider semiparametric versions of model (1.1), assuming that some of the ARCH 
coefficients are not time-varying. Eor f e Up -i- 1, r] = [p -i- 1, T] n N, let Mf and Nt be two random vectors 
of size m and n respectively, with m + n = p + I and defined as follows. We split the interval |[0, p] into 
two parts {q\,q 2 ,.... qm) and {ri,r 2 ,..., r„) with q\ < ■ ■ ■ < qm and ri < • • • < r„. If = 0, we set 
Mt = ...,, with the convention Mt = I if m = I. If > 0, we set ,...,. 

The vector Nt is defined similarly, replacing the q/s with the r/’s. In particular, the coordinates of the random 
vectors Mt and Nt form a bipartition of the set |l, ... ,X}_^. Now, we assume that the coefficients 
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vector yS = (a^ , UrJ' is constant. We also set at = {aq^{tlT), ..., aq^itjT'^ . Then the model writes 

Xt = ^tcrt, aj = M',at + N'tl3, (2.1) 

for t = p + l,... ,T. Throughout this paper, we will assume that for all realization a» in the probability space, 
(Xt(a)))t<o is a path of a stationary ARCH process with noise ^ and coefficients aj(0). From this convention, 
one can get a local approximation of a tv-ARCH process by stationary ARCH processes with parameters 
(^ao(u), ..., ap(u)y See Lemma 1 in the supplementary material for details. 

In Section 4, a statistical test will be given for testing Hq: (a^j,... ,ar„) is constant. The sequel of this 
section is devoted to the statistical inference in model (1.1) under the null hypothesis Hq. The corresponding 
estimation of parameter /3 will be necessary to construct the test. 


2.2 Estimators of the parametric part p 


Considering the square of the process (2.1), statistical inference in model (2.1) can be viewed as a linear 
regression problem. More precisely, we have for t > p + 1, 

= M'at + N'tP + - \)crl (2.2) 


In the sequel, we consider a sequence of weights {Wt)p+i<t<T such that Wt is a measurable function of t, T 
and X ^_^,..., Xj_p. For stating our results, we will only consider sequences of the form 


W,- 


7o{tlT) + ^yjitlT)Xlj 


(2.3) 


where the y/’s are positive and Lipschitz continuous functions defined over [0,1]. The use of this kind of 
weights is classical in weighted least squares estimation in order to relax moment conditions on the marginal 
distribution or to gain in efficiency. The first goal of our procedure is to estimate the parameter yS. This is 
the most difficult part of our methodology since a VT-consistent estimate is expected. Once an estimate 
P with the classical parametric rate of convergence is available, a pointwise estimate of parameter a(-) can 
easily be obtained. One can just plug p in (2.2) and apply standard nonparametric methods (in this paper 
we will use the local weighted least-squares method studied in Fryzlewicz et al. (2008)). Our aim here is to 
first eliminate the nonparametric component M'a^. Our approach is classical in the setting of partially linear 
models, for which the regression function involves a parametric component and a nonparametric component 
(see for example Hardle et al. (2000), Chapter 6 for some results in the case of time series). However, 
our method is based on nonparametric estimation of linear projections of y/WtXf and y/WtNt onto the 
subspace generated by the components of yfWtMt instead of a nonparametric estimation of the conditional 
expectations. Moreover, our two-step approach involving some weights and leading to semiparametric 
efficient estimates is not common for nonstationary time series and no existing results from the theory of 
partially linear models can be used here for our purpose. Here, our approach can be also interpreted as 
a partial regression. For stationary ARCH processes, estimation of the whole set of parameters using a 
regression model for the squares and least squares estimation has been studied by Bose and Mukherjee 
(2003) and Horvath and Liese (2004). 

Now we introduce our estimator. We first multiply the two members of equation (2.2) by yfWt. If 
Pt (and Pt (denote the (componentwise) orthogonal projection of ylWtXf and ^fWtNt 
onto the linear subspace generated by the coordinates of yfWtMt, it is easily seen that 

Mxj - Vt ( - Vt ( /? + (^? - 1) (2.4) 
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The use of these orthogonal projections are natural in order to eliminate the nuisance parameter cr, and to 
get a partial regression involving parameter p only. Let us introduce some notations for expressing these 
projections. For t > p + 1, we set 

q\,t = q2,t = 


where 

S3,t - E {W,M,M'), 51,, = E (WtMtXj), si,, = E {WtMtN[), 

Then setting 

y, - - M'qi„ Ot^Nt- q'2,Mr, 

we have 

yJWtXj - Pt ( ^[WtX^) = ^/WtV„ yJWtNt - !P, ( V^A^,) = V^O, 
and equation (2.4) writes 

Viyy, - ^tO,p + (^j - 1) ^/w,crl 

The idea is now to use a least squares estimator for /3. Of course, in order to obtain a feasible estimator, it is 
necessary to first estimate the two quantities qi t and ^ 2 ,?- To this end, we consider 


T T 

h,b,t = Yj Ki(b)WiMiM[, Si,b,t = Y ki(b)WiMiXl 

i=p+l i=p+l 


S 2 ,b,t = 


j] k,Ab)WiMiN;. k,j(b) = 




i=p+\ 


X 

i=p+l 


K 


t — i 

~fb 


where for A' is a kernel and A > 0 is a bandwidth parameter. Throughout this paper, the kernel K is assumed 
to be absolutely continuous and with support [-1,1]. Then we set 


qv,b,t = {h,b,t) ^ s\,b,t, q2,b,t = ih,b,t) ^ h,b,t- 


(2.5) 


Now we introduce the following notations. The quantities 


y, = Xf - M[q,^b,t, Ot = Nt-q'2i, ,M, 


estimate y, and O, respectively. Our estimator of parameter /3 will be denoted by p and minimizes the 
function £-w defined by 

£w{p)= Y ^t(Vt-d',p)\ 

t=p+\ 


We gef 




2 ] Wtdtd\ 

J=p+\ , 


T 

t=p+l 


( 2 . 6 ) 


If is now possible fo define an esfimafe of paramefer a{u) for n e [0,1] by minimizing fhe funcfion 


a 1 -^ 
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for an integer t e |[1, r]| such that |m - ^| < j (e.g t = \Tu\ where {x\ denotes the integer part of a real 
number x). Since the nonparametric estimation of the function a requires a less restrictive assumption on 
the bandwidth parameter than for the estimation of parameter yS, we introduce a new bandwidth b'. This 
leads to the estimate 

at ^ qi,b',t-q 2 ,b',fP, (2.7) 

Note. Expression q\^b,t and q 2 ,b,t involve the inverse of the matrix S 3 ^b,t- One can show that 

P (det (53 f) = 0 for some f < T) —> 0. 

See Lemma 3 and its proof. However invertibility problems can occur when the noise ^ has a mass at point 0, 
for instance. For simplicity, we always assume that all these matrices are invertible. Studying our estimator 
on an event with probability tending to one only complicates the statements and proofs of our results by 
adding some indicator sets but does not change the used approach. One can also show that this distinction 
is unnecessary for a noise ^ having a density. 

Asymptotic normality of the estimates (2.6) and (2.7) will be derived in the next section as well as some 
plug-in versions to get optimal asymptotic results. 

3 Asymptotic results 

3.1 Estimation of the parametric component 

Our first result shows that the estimator (2.6) is VT-consistent under some conditions. Here are our main 
assumptions. 

Al. For j = I,... ,p, the function aj is non-negative and Lipschitz continuous. The function ao is positive 
and Lipschitz continuous. Moreover, 


c = 


sup y aj{u) < 1 . 

H£[0,1] ^ 


A2(h). For the integer h >2, there exists a real number 6 € (0,1) such that < 00 . 


Assumption Al is the classical contraction condition used in Fryzlewicz et al. (2008) to define tv-ARCH 
processes. Assumption A2(h), used for different values of h in the sequel, implies a restriction on the noise 
distribution. Let us mention that this condition does not restrict the moment condition for the marginal Xt 
(in the stationary case, i.e when the ay’s are deterministic, assumption Al is the necessary and sufficient 
condition for the condition E < - 1 - 00 ). 

In the sequel, (X^u))? will denote the stationary ARCH process with coefficients {a{u),P). Then we will 
use the notation (resp. for the stationary approximation of (Mt)t (resp. (Nt)t, 

{Wt)t)- For example. 


p 




Wt{u) = 


joiu) + ^ ye{u)X^_f,{u) 


{=\ 




6 




Theorem 1. Assume that assumptions A1 and A2(4) hold, b Vt —> oo and 
Vt —> 0. Then we have the following convergence in distribution: 

^T^oo Nn (O.Zr'W') , 

with ^ 

Ii = E I Wfu) (Nfu) - q 2 {u))'Mfu)) ■ {Nfu) - q 2 {u)'Mfu))' du, 

Jo 

El = Var (^2). E J Wi(m)Vi(m)'* (Nfu) - q 2 {u)'Mi{u)) ■ {Nfu) - q 2 {u)'Mfu))' du, 

and 

q 2 {u) = E“i {Wx{u)Mi{u)Mfu)')E{Wi{u)Mfu)Ni{u)'). 


Notes. 


1. The bandwidth conditions used in Theorem 1 are classical for estimating the parametric component 
in partially linear models. With this restriction, the nonparametric estimation step involved in the 
expression of jS becomes negligible (i.e for i = 1,2, qj^h^ can be replaced with qtj without changing 
the asymptotic behavior of yS). Let us explain the rule of these conditions. Some nonparametric 
estimates are introduced to approximate the two ratio qi t and q 2 j. But, up to CjT (C denotes a 
positive constant), this two ratio can be seen as some Lipschitz functions of tjT (see Lemma 5 in the 
supplementary material). The mean square error for the kernel estimation of a Lipschitz functional 
in a regression model with deterministic design is bounded by ^ (up to a constant). Then our 
bandwidth conditions entail that this mean square error converges to zero with a faster rate than ^^T. 

2. The goal of the proof of Theorem 1 is to show that the asymptotic distribution of Vt {p - (3^ is the 
same as if the two quantities qi^j, q 2 .b,t are replaced by f, q 2 ,t respectively. Hence, the control 
of sums involving differences between these quantities are shown to be negligible. To this end, we 
make Taylor expansions and bound the variance of some multiple weighted sums appearing in this 
expansion using Lemma 4 given in the supplementary material. 

3. The asymptotic variance in Theorem 1 can be estimated consistently using the data. Indeed, the proof 
of Theorem 1 given in the supplementary material shows that 




lim 

r—>+oo 


1 ^ 

/=p+l 


a.s. 


Moreover, we have 

1:2 = eJ Wi(uf (Xi(uf - o-i(uff (Ni(u) - q 2 {u)'Mfu)) ■ {Nfu) - q 2 {uy Mfu))' du. (3.1) 

Then an estimate of E 2 can be obtained if we replace q 2 with q 2 ,b,-, cr? with a pointwise estimate 
(see Theorem 3) and using the same kind of empirical counterpart as for Ei. 
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The asymptotic variance given in Theorem 1 depends on some weights Wt{u). One can show that its minimal 
value (in the sense of non-negative definite matrices) is obtained for the choice W* - Indeed, setting 
Oi{u) = Ni(u) - q 2 {u)'Mi{u) and 0\{u) = Ni{u) - q^iu )'where 


I I \ a-m* I 


we have for all u e [0,1], 

( Wi{u)0i{u)0* 


- E r Wi{u)Oi{u)Ol(uydu - E r 

Jo Jo 


, o*Auy 

Wi{u)(rf{u)Oi(u )—T- du. 

o-jiu) 


Then if v:,y € R”, we have from the Cauchy-Schwarz inequality 

(;c'Eiy)^ < -- X y'Ey. 


Var 


where E = E oi(u) 0 *(u) setting x -l,Az and y = E for z € R^, we get 

(T^(w) ^ 


, , ,2 z'E7'E2E-‘z , , 

(jE-^z) <— ^ . ! xjE-^z. 
^ ’ Var 


Then we have proved the following result. 

Proposition 1. The lower bound for the asymptotic variance given in Theorem 1 is Var E~\ where 


E = J" e|—(a^i(m) - q* 2 {uyMi{u)){N\{u) - q^iu)'Mfu)) jdu, 


where 


qliu) = E“^ 


lMAu)Mi{uy 

\ o-fuf 


l Mi{u)NAuy \ 

\ o-fuf /■ 


Now, we show that it is possible to construct an estimate of parameter /? which has the asymptotic 
variance given in Proposition 1. A natural candidate is obtained by replacing the weights Wt in (2.6) with 
an estimation of the optimal weights W* = We set IT* = , where 


= M[ (qi^b, - q2,b/p) + N;P 


is an estimator of aj. The sequence {vt)t is a sequence of positive real numbers such that vj = o j- The 

use of this sequence is just technical and avoids possible small values for the fitted volatility which is not 
ensured to be bounded away from 0 for finite samples. However, for a large sample size, our simulations 
show that the choice = 0 does not alter the performance of the plug-in estimate. Let us define the 
quantities 

T T 


,, - 2] k,i{b)W*MiM'., si, , 


- 


i=p+l 


2] ktj{b)W*MiXl 

i=p+l 












We also set 


2At = E 


i=p+\ 


A.5fC / A5(C \ ^ A.5fC A.5fC / A.5fC \ ^ ASfC 

^\,b,t - ^\,b,r ^2,b,t - ^2,b,f 

Now we introduce the following notations in order to simplify the expression of our estimator. 

v; - X? - „ 6: =N,- M,. 

Our plug-in estimate of parameter p is now defined by 


\-i 


P* - 


T f ^ r 

^ w;d](d*^ w;d*vp 


l,r=p+l ; f=p+l 

With respect to Theorem 1, we impose more restrictive assumptions. 


A3. has moments of any order. 

A4. For j = 0,p, the coefficient aj is a positive function. 

Theorem 2. Assume that the assumptions Al, A3 and A4 hold and bT^~'^ oo, 0 for some 

T € (0,1 /4). Then we have 

^ff(p, -p) 

The proof of Theorem 2 is similar to that of Theorem 1 but involves more tedious arguments. A detailed 
proof of Theorem 2 is given in the supplementary material. 


3.2 Estimation of the nonparametric component 

Now let us investigate the asymptotic properties for time-varying coefficients estimate a defined by (2.7). 
The estimator p appearing in the expression (2.7) is constructed using the initial bandwidth parameter b 
which satisfies the assumptions of Theorem 1 . 

Theorem 3. Let u € [0,1]. Assume that the assumptions of Theorem 1 hold. Then ifb' 0, b'T oo and 
t - tj satisfies |^ - m| < ^, 

^/flT {at - a{u)) + Vn/E-' {Wi{u)Mfu)Mfuy) {Afu) - Af{u)) Nm (0,^(m)) , 


where 

'Viu) = Var • J K{xfdx 

■ E-' {Wi{u)Mi{u)Mi{uy)E[wfufo-fufMi{u)Mi{uy)E-^ {Wfu)Mfu)Mi{uy). 
Afu) ^ si^h\t - hb',tP - h,b',ta{u), 

A%u) = ktj{b')Wi(u)Mi(u) [Xiiuf - N'.{u)P - Mfuyaiu)). 
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This result is similar to that obtained in Fryzlewicz et al. (2008), Proposition 3. The part At{u) - A*{u) 
can be interpreted as a term of deviation with respect to stationarity. As pointed out in Fryzlewicz et al. 
(2008), this term satisfies A,(m) - A*{u) = O^ib). One can easily check that the optimal asymptotic variance 
in Theorem 3 corresponds to the choice IT* = for the weights. Thus, a plug-in approach is natural. We 

set IT*. = —, where 

&l = M'.at + N;p 

and (jut) is a sequence of positive real numbers which now plays the rule of the sequence (y 7 ')r previously 
used for the optimal estimator yS*. Once again, this choice is only technical and we impose here yuj- = 

O ^b' + j- Then we define the following estimator of parameter at- 

~ ^2,b',tl^ ^ 

where for j = 1,2,3, Sj^b',t is obtained as sj^b'j but replacing IT, with lT*y. 

Theorem 4. Assume that assumptions Al, A2(4) and A4 hold, b'T —> oo and ^Tb' {b'Y° Ofor a given 
integer Iq. Then, if\Y-u\<Y,we have 

- a{u)) + V^E-' j (b,(u) - B*{u)) ^ N,n (0, A^*(u)), 

where 

A'M = Var (^2) . J K{x)^dx ■ E"! , 

Bt{u) = si^b',t - h,b',tP - h,b',ta{u), 

Btiu) = Zlp+i Ki{b')W*{u)Mi{u) [Xiiuf - N;{u)/ 3 - Mi{uya{u)). 

Moreover, Bi{u) - B*{u) = Of{b'). 


Notes 


1. Compared to Theorem 3, Theorem 4 uses a more restrictive assumption b'. However for powers of 
the sample size, i.e b' = CT~^ with constants C,i > 0, the conditions are equivalent. 

2. When all the coefficients of the volatility are time-varying, replacing Mi by (l,2fo,..., , we 

recover the expression of the optimal asymptotic variance given in Fryzlewicz et al. (2008) for the (lo¬ 
cal) weighted least-squares estimation. This asymptotic variance coincides with that obtained with the 
local QML estimator studied in Dahlhaus and Subba Rao (2006). However, a crucial assumption in 
Theorem 4 is the positivity of all the coefficients of the volatility. To avoid this restriction, Fryzlewicz 

p 

et al. (2008) consider the sequence of weights IT, = 


s-2 


7=1 


where dt is a nonparametric 


estimate of EA^. For the nonparametric estimation of the whole set of coefficients, they show that the 
corresponding weighted least squares estimator is asymptotically normal, even if the ARCH coeffi¬ 
cients are only nonnegative, but at the price of a small loss of efficiency. We claim that the weights 
ITf can also be used in our context to obtain a result similar to Fryzlewicz et al. (2008), Proposition 4. 
Details are omitted. 
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3. As usual for this kind of nonparametric estimation, a better finite sample approximation of the distribu¬ 
tion of the parameter estimators can often be obtained using bootstrap methods. With straightforward 
modifications, it is possible to use the bootstrap method studied in Fryzlewicz et al. (2008) (see Sec¬ 
tion 5 of that paper) to obtain pointwise confidence infervals for fhe componenfs of a. Since fhis paper 
is mainly devofed to testing and estimating some non-time varying coefficients, we will not consider 
this bootstrap scheme. 

4. Detailed proofs of Theorem 3 and Theorem 4 are given in the supplementary material. 

3.3 Asymptotic semiparametric efficiency 

For Gaussian inputs (i.e, ^ ~ N{0, 1)), it is possible to show that the matrix given in Proposition 1 is 
a lower bound in semiparametric estimation. We refer the reader to Bickel et al. (1998) for a general intro¬ 
duction to semiparametric models and the problem of efficient estimation of a finite dimensional parameter 
in such models. In our case case, the problem of semiparametric efficiency for estimating the parameter yS 
involves triangular arrays. This is why we will use an abstract result using the classical formalism presented 
in van der Vaart and Wellner (1996) (see Chapter 3.11). Intuitively, one can see the matrix 2S“^ as the 
smallest asymptotic variance obtained for estimating yS in submodels for which the nuisance parameter a? 
is projected onto a finite dimensional space of square integrable functions. Formally, the approach consists 
in writing a LAN expansion of the likelihood ratio and then using a general convolution theorem. In the 
sequel, we set for m, n > 1, H = ([0,1])'” x W. Then H is an Hilbert space for the classical scalar product 



However, in the sequel, the space H will be endowed with an equivalent scalar product defined by 



where 



Now, we denofe by X the set of Lipschitz functions / : [0,1] —> R and we set H - X™ x R” where m (resp. 
n) is the dimension of vector Mt (resp. Nt). Then // is a linear subspace of H. The set {H, < •, • >h) will 
be referred to the tangent space. For Gaussian inputs, we first derive a LAN expansion for the (conditional) 
likelihood ratio. We denote by PT,ajj the conditional distribution (Xp+i,... ,Xt^ \Xi = xi,... ,Xp = Xp. 


Proposition 2. Assume that ig,h)) € where the coordinates of a and f are positive. Then we 

have 





1 


where 



Moreover, 
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From this LAN expansion, we derive a lower bound for the asymptotic variance of regular estimators of 


For {g,h) e H, we set KT{g,h) = /? + -^. Then, the sequence of parameters [Kjig^h) : {g,h) e H) is 
regular: if it: E —> R” is the projection operator defined by k{g, h) - h, then 


Vr {Krig, h) - kt{0, 0)) = h. 


Corollary 1. If the assumptions of Proposition 2 hold, then the adjoint operator : R.” —> H of k is given 
by 


K V = 


In 




V, V € 


Consequently, the limit distribution of a regular estimator of f equals the distribution of a sum Li + L 2 of 
independent random vectors o/R” and such that 


Li ~/^„(o,22;-'). 


4 Statistical testing 

4.1 Testing parameter constancy 

For a real data set, it is necessary, before applying the methodology given in Section 2, to test if a coefficients 
vector of the form/3 = (a,.,, ... ,ar„) is time-varying or not in model (1.1). This is equivalent to test model 
(2.1)versus model(l.l). Whenn = pand/? = ..., such a statistical test is interesting for deciding if 

model (1.3) is a convenient restriction of the tv-ARCH model. This case is of particular interest for real data 
applications. In Zhang and Wu (2012), a procedure is proposed for testing if some coefficients are constant 
in a general time-varying regression model. The null hypothesis is Hq: /?(•) constant. The test statistic used 
in Zhang and Wu (2012) is based on a distance between an estimate under the alternative and an estimate 
under the null hypothesis. In this part, we derive asymptotic properties of this test for tv-ARCH processes. 
For simplicity, we will only consider some estimates without plug-in (i.e we fix a sequence of weights {Wt)t 
of the form (2.3) and use the corresponding least-squares estimates). Let us first introduce some additional 
notations. 

For a function / : [-1,1] ^ R , we set ||/||2 = f{ufdu and for x e [-1,1], 

pl-2|x| 

K*{x) = J K(v)K(v + 2lxl)dv. 

Setting for u € [0,1], efu) - and for p + \ < t < T, Xt = {M[,N',)', the kernel estimate of the 

full vector of ARCH coefficients a{u) - {a{u)',l5{u)')' is given by (see Fryzlewicz et al. (2008)) 

T 

aiu) = Yj eiiu)WiXfXi, 

i=p+[ 


where S„ = ei{u)WiXiX'-. Then we set/?(«) = Aa{u) where A is the matrix of size {p + \)xn defined 
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by Ax = 


A 

~^m+\ 


^p+[ 


. Note also that /3{u) = Aa{u). We also set = E{W\{u)X\{u)Xi{uy) and 


Oiu) - K;A^iWi(uf[Xi{uf-( tA)) A 

= Var ■E{Wi(uf(T,iuAiiu)X,(uy) • A- 


Let (r(M))j,g[o,i] be a family of positive definite matrices of size (p + 1) x (p + 1) such that u i-> r(M) is a 
Lipschitz function. Finally, we set for j = 1,2, 

1 

Tr [r(M)b2A<3(M)AT(M)b2]^jM. 

We define our fesf sfafisfic St (a,fS) by 



St (a,j8) = J' (p{u) - jS) Y{u) {p{u) - /3^ du. 

The proof of fhe following fheorem is given in fhe supplemenfary maferial. 

Theorem 5. Assume that assumptions Al, A2(8) hold and yS is non time-varying. Then ifTb^ 
Tb^-^ 0, we have 


(4.1) 


oo and 


T (a,/3) - ^ N (O, 4||^*||2ti72) . 


(4.2) 


Moreover (4.2) holds if St (d,P) is replaced with St where 0 is the estimate of Theorem 1. 


Notes 

1. As poinfed ouf in Zhang and Wu (2012), if we are inferesfed in prediction, fhe mafrix F can be chosen 
as fhe asympfofic variance of fhe kernel esfimafe yS(-) (which has fo be esfimafed in pracfice). In our 
numerical sfudies, we will use fhe simple choice F(m) = where /„ denofes fhe idenfify mafrix of size 

n. 

2. Quantifies nri and ru 2 involved in fhe bias and asympfofic variance in (4.2) can be esfimafed consis- 
fenfly, faking empirical counferparf. Then we obfain a pivofal sfalislic 

&T = T vjb jiSr {a,^ - / ^ 2 ||A '*||2 V^) and one can rejccf the null hypothesis for large val¬ 

ues of this statistic. However, in practice, such nonparametric tests suffer from fhe slow convergence 
in Theorem 5. As in Zhang and Wu (2012), one can use a Monfe-Carlo fype procedure which can 
improve fhe finife-sample performance (a similar Monfe Carlo procedure is also used in Pafilea and 
Raissi (2014)). Note fhaf fhe resulf of Theorem 5 is valid for i.i.d series wifh a sfandard Gaussian 
marginal disfribufion. In particular, if &*j denofes fhe pivofal sfalislics computed wifh an i.i.d sample 
of sfandard Gaussian variables, we have limT’^ooS^ = lim 7 ’_»oo£ 7 ’ = Af(0,1) in distribution. Then 
one can use the quantiles of the distribution of &j to compute the critical values for the test (instead 
of the Gaussian quantiles). Let us give the details of the method proposed in Zhang and Wu (2012). 
We assume that the bandwidth b has been already selected. 
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• First simulate B samples of size T of i.i.d Gaussian random variables. For each sample, compute 
the values of the estimators y6(-) and ^ as well as the realization of the pivotal statistics 'Pj. 

• Then, from these B realizations of the random variable &j, compute the empirical quantile 

of order I - a. 

• Reject Hq if Sj is greater than ^mc( 1 ~ o')- 

3. In Zhang and Wu (2012), the power of the test under some local alternatives is studied (see Theorem 
3.2 of this paper). A similar result can be derived here under the assumptions of Theorem 5. In 

particular, for some local alternatives of the form jSi(t/r) = /3 + ZrfitlT), with 1 / yfb - o{zt) and 
/ a Lipschitz function defined over [0,1], the power of the test still converges to 1. 

4.2 Testing if a constant parameter is equal to zero 

In this part, we consider model (2.1) and our goal is to test whether the vector yS is equal to zero. Two 
approaches are discussed below. 

1. One possibility is to use the asymptotic normality of the estimator yS given in Theorem 1. Under the 

null hypothesis p = 0, the asymptotic distribution of ^^Tp is that of a centered Gaussian vector 
with covariance matrix ’V - We have already discussed how to estimate the covariance 

matrix 'V. If 'V denotes such an estimate, the statistics r||'y~Zy6|p is asymptotically distributed as a 

with n degrees of freedom (here || • || denotes the euclidean norm on R"). As for the test given 
in the previous subsection, one can use a Monte Carlo method instead of using the quantiles of the 
asymptotic distribution (the convergence in distribution of the previous statistics is quite slow because 
of the incorporation of nonparametric kernel estimates). If a bandwidth b is selected, one can simulate 
B samples of Gaussian i.i.d random variables and compute the corresponding values of our statistics. 
From these values, we can compute the empirical quantile qMC,a of order I - a and reject the null 
hypothesis if T\YV~ip\^ > qMC,a- This test has an asymptotic level a and a power converging to 1 
under a fixed alfemafive. However such approach is nof complefely nafural because fhe value yS = 0 is 
on fhe boundary of fhe parameter space, our fesf is similar fo fhe bilateral fesf for fesfing fhe hypofhesis 
yS = 0 in regression models and we ignore fhe sign of p. This will resulf in a loss of power and if is 
more nafural fo consider a sfafisfics based on fhe random vector ^ Vt max The vector 

(maxfyS,, will be called fruncafed leasf squares esfimafor. As discussed in Francq and Zakoian 

(2008) for sfMonary ARCH processes, fruncafed leasf squares estimators are nafural for fesfing if 
some lag coefficienfs are equal to zero. However, fhe limifing disfribufion of fhis fruncafed random 
vector is fhaf of (max (Zy, 0))i<;<„ where Z = {Z\,..., Z„) is a Gaussian vector, wifh dependenf enfries 
in general. Then, excepf if Var (Z) is diagonal, if is nof possible fo gel a pivolal sfafisfics from fruncafed 
leasf squares eslimafors. Then fhe Monle-Carlo mefhod used for fesfing parameler conslancy cannof 
be applied. Nofe also fhaf boofslrapping fhe model is nof appropriated here because fhe boofslrap 
is generally inconsisfenf for fesfing a parameter on fhe boundary (see for inslance Andrews (2000) 
for fhis problem). However, when p denofes fhe full veefor of lag coefficienfs, if is possible to use 
fruncafed leasf squares and fhe Monte Carlo mefhod, provided IT, = 1. This poinf is discussed below. 

2. When p is fhe full veefor of lag coefficienfs, fhe problem is fo fesf model (1.2) versus model (1.3). For 

fesfing if fhe lag coefficienfs are equal to zero in model (1.3), our fesf is based on fhe following resulf. 
The following nofafion will be used. If p + 1 < t < T, we sef = Yj=p+\ Note fhaf dt is an 

esfimafor of 
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Proposition 3. Assume that A2(4) holds and that b e [cT '\CT with \ < h < ^ - 2 (i+s) ’ where 
c, C are positive constants. Then under Hq, 



T 

/ 

p 

(di,. 

.,ap) = arg min V 

' ai,-,ap 

t=p+\ 

Xi-a 

\ 

7=1 


satisfies T Zy-i O) ^y-i ’ where Z standard Gaussian vector and 

Var (Xoiufif du 
cr = - 

Var (Xo(m)^) du^ 


Testing the second order dynamic. From this result, we reject Hq for large values of the statistics 
'■Vt = T O) /(T^ where is a consistent estimator of cP-. Typically one can choose 


= T^aQ{tlTfl ^aoitlT) 


t=i 


\t=i 


with aoitjT) - dt. 


which gives a consistent estimate under Hq. A first solution is to reject Hq if 'Fr is larger than the 
quantile of order 1 - or of the distribution of the random variable max {Zj, 0^ . But the statistics 
Tr is also asymptotically pivotal and its quantiles can be approximated by a Monte Carlo procedure 
similar to that used for testing parameter constancy. 


Notes 

1. Note that the estimators of the lag coefficients corresponds to the estimators (2.6) with Wt = \. The 
optimal rate of convergence /j = ^ for kernel estimation of Lipschitz functionals can be used if d > 5 
in assumption A2(4). A proof of Proposition 3 is given in the supplementary material. 

2. In the stationary case, two benchmark tests are usually used for testing the second order dynamic: 
the Lagrange multiplier test of Engle (1982) and the portmanteau test of McLeod and Li (1983). 
Patilea and Raissi (2014) have recently extended these two tests for model (1.3), taking in account of 
nonstationarity. Here we provide an alternative test based on a direct estimation of the lag coefficients. 
A comparison of the different approaches is beyond the scope of this paper. Let us observe that in the 
stationary case, the constant cP in Proposition 3 is equal to 1 whereas in the nonstationary case, this 
constant cr^ is a correction factor which can be written 

~ J" tiQ(ufidu/^J" 

The correction factor cr^ also appears in the asymptotic results of Patilea and Raissi (2014) (see the ra- 
tio appearing in the two statistics used in that paper). Ignoring this factor leads to an oversized 

test and the null hypothesis will be often rejected when the data are independent but not identically 
distributed. Moreover, let us notice that our moment condition for the noise distribution is less restric¬ 
tive for applying the test (a moment greater than 8 is assumed in Patilea and Raissi (2014)). 


ao{u)^du\ > 1. 
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3. One can study the power of our test under local alternatives of type ^^T{a\,..., Up)' s where 5 is a 
vector of nonnegative real numbers. One can show that if < 00 and the bandwidth b satisfies 

Tb^ —> 00 and Tb^ —> 0 , then 


T ^ max (aj, of l&^ > 
,/=i 


i' 

z 

7=1 


Sj \2 

maxi- \-Zj,0j > q\- 


where q\-a is the quantile of order 1 - o' of the distribution of niax(Z/, 0)^. Details are omitted. 


5 Real data applications 

Before real data applications, let us provide some recommendations on the choice of tuning parameters. 


5.1 Choice of tuning parameters 


The practical implementation of our estimation and testing procedures requires the choice of some weights 
Wt, some bandwidth parameters as well as the number of lags p in the model. In this subsection, we discuss 
the practical choices of these parameters. In all our studies, the kernel W will be the Epanechnikov kernel. 
For simplicity, only one bandwidth parameter will be selected for the semiparametric models (this means 
that we set b' - b in (2.7)). We use a cross-validation method as specified below. The selecfed bandwidfh 
will be used for fhe fesfs. For fhe fesfs, fhe Monfe-Carlo procedure will be always applied wifh B = 2000 
samples of i.i.d sfandard Gaussian random variables. 


In pracfice, a sequence of weighfs {Wt)t has fo be chosen for applying our mefhod. One possibilify is fo 

r n ^-2 


use fhe weighfs Wt = 


1 -t 




7=1 


suggesfed in Horvafh and Fiese (2004). In our implemenfafion. 


we use fhe weighfs Wt = (v + ^f-;) where v = 7 YJt=\ ^'t esfimafe of fhe average of fhe 

variance v = E (Xo{u)^^ du. There are several advanfages in using fhese weighfs. Firsf, fhe lag 
esfimafes obfained in model 1.3 do nol depend on fhe scale of fhe refums, a properfy always salisfied 
for fhe frue lag coefRcienls (if Wt denotes fhe price af time t, Xt = log(P,)-log(/’,_i) or lOOxW are fwo 
differenl scales used in pracfice). Moreover, for sfafionary Arch processes, fhis choice is equivalenf 
fo fhe weighfs used in Fryzlewicz ef al. (2008). We also noficed heifer finite sample performances 
for our fesfs and inference procedures wifh fhis choice. However, fhe inlroduclion of fhe random 
quantify v is nol laken in accounf in our Iheorelical resulls. Buf, inspeclion of our proofs shows 
lhal fhe conclusions of fhe Iheorems remain unchanged if v - v = The laller condition 

is salisfied if E|W|^’ < 00 for h > | (fhis can be juslified using fhe momenl inequalily given in 
Fryzlewicz el al. (2008), see Femma A2). A sufficient condition for the finiteness of this moment is 
Ei/^'(|^ol'*)sup„,fo,i]i:;=i aj{u) < 1 which is more restrictive than the initial condition given in A3. 
Despite this slight restriction, we only consider the aforementioned sequence of weights in the sequel. 


• For model (1.1), we use the cross-validation method considered in Fryzlewicz et al. (2008). The 
bandwidth parameter is selected by minimizing the function 

b^ Y, - X',a^-^\b))^ , 

/=p+i 
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where ^^\b) = 




K 


k=p+l 


t — k 

nr 


WtXlX'^ 


-I 




K 


k=p+\ 

k^t,...,t+p 


t — k 

~fb 


WkXtXk. 


For model (1.3), we choose the bandwidth b by minimizing 

T 


t=p+\ 


Here for n = 1,2, q^^bt version of q„^b,t (see (2.5)) defined as a~^^\b), Nt - ■ ■ ■ ’^I-p) 

and yS = (ai,..., ap)'. This type of cross-validation is a weighted version of the method proposed by 
Hart (1994) for AR models with a time-varying mean. Note that minimizing the latter function with 
respect to yS, b being fixed, leads fo an estimate close fo fhe esfimafe defined in (2.6). 


• Finally, we discuss fhe selecfion of fhe number of lags p in model 1.1. An informalion criferion has 
been sfudied recenfly by Zhang and Wu (2012) for time-varying regression models. If is possible fo 
adapf fhe approach used by fhese aufhors to our setting. To this end, we define for p = 0,1,..., ^, 


C{p) = log 




t=p+\ 




+ (rip + 1 ). 


where Xt = ... ,Xf_pj , = (v -i- the coefficienls estimates obfained 

for fhe fv-ARCH wifh p lags buf compufed wifh fhe weighfs ITt?) and is a vanishing sequence of 
posifive numbers. The goal of fhe selecfion procedure is fo minimize p i-> C{p) in fhe spirif of AIC 
or BIC criferion used for regression models. The bandwidfh b is selecfed by cross validafion for fhe 
fv-ARCH model wifh q lags. Of course, condition on fhe decrease of has fo be imposed fo gef 
consistency (i.e P(argmino<p<^ C(p) = po) —> 1 if Po ^ ^ is the true number of lags). Inspecting the 
proofs of Lemma A6 and Theorem 3.3 in Zhang and Wu (2012), we find fhaf condition °° 

guarantees consisfency (using fhe argumenfs of fhe proof of Theorem 5, one can show fhaf fhe quanfify 
(pri.kpT + Pt) in Lemma A6 of Zhang and Wu (2012) can be replaced wifh r“2/3 confexf). For 
real dafa applications we choose = log(log(r))/(rf7) where b is selected using cross-validation. 
Our infensive simulafion sfudy reporfed in fhe supplemenfary maferial shows fhaf fhis choice gives 
reasonable performances. This choice can be also jusfified using fhe argumenf given in Zhang and 
Wu (2015): {p + \)lb can be seen as fhe effeclive number of parameters in kernel smoofhing. Hence 
our choice has a similarity wifh fhe Hannan-Quinn informalion, excepl fhaf we gave up fhe conslanl 2c 
wifh c > 1 used in Hannan and Quinn (1979). We found fhaf adding such a facfor underesfimafes fhe 
order in our case. A precise justification of our choice using a version of fhe law of iterated logarifhm 
is nof fhe goal of fhis paper. However, one can notice fhaf applying cross-validafion on an inferval of 
type ty compatible wifh our consistency condition. In fhe applications, fhe maximal 

number of lags is sef fo ^ = 10. 


Using fhe approach described above, we conducfed an exlensive simulafion sfudy. Some numerical ex- 
perimenfs are reported in fhe supplemenfary maferial and show fhe good behavior of our mefhod for various 
simulation sefups. This simulafion sfudy shows thal our estimators and tests have reasonable performances 
for the sample sizes considered in the sequel. Here we only report the results obtained with real data. 
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For real data applications, the bandwidth will be always chosen using CV over a grid of the form [c, C] x 
F'FS \vhej-e Q and C are two positive constants (we recall that is the optimal rate of convergence for 
the bandwidth in the nonparametric estimation of a Lipschitzian regression function). We will also use some 
acronyms for our models. Model tv(p) denotes the tv ARCH model with p lags (the case p - 0 refers to 
model (1.2)) while model sptv(p) denotes model (1.3) with p > 1. In the sequel, we consider two currency 
exchange rates and one stock market index. The log returns Xt - log(/’,) - log(P(_i) of the initial series 
{Pt)t will be modelized with ARCH processes. 

5.2 Exchange rate USD/Euro 

In this subsection, we study the exchange rate series between the US Dollar and the Euro. We consider the 
period from January 03, 2000 to February 13, 2015. The sample size is T = 3799. As usual for this type of 
series, the autocorrelograms suggest correlation for the squares of the transformed series. 



Figure 1: Autocorrelogram and autocorrelogram of the squares for the logged and differenced daily ex¬ 
change rates USD/Euro 

For this data set, the information criterion selects p = 0. To confirm this choice, we apply our procedure 
with p - 2. The results for testing the hypothesis of non time-varying coefficients are reported in Table 1. 
The intercept function seems not constant in contrast to the lag coefficients for which it is not possible to 
reject the null hypothesis. Fitting a sptv(2) process gives small negative values for the lag coefficients and 
it is of course not necessary to test if they are equal to zero. This conclusion suggests an absence of second 
order dynamic for this series. We refer the reader to Granger and Starica (2005) and Herzel et al. (2006) for 
other analysis suggesting a similar behavior for some financial time series. 
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Non t-v flo 

Non t-v a\ 

Non t-v a 2 

Non t-v {a\,a 2 ) 

b 

0.0005 

0.138 

0.1645 

0.6415 

0.028 


Table 1: The /7-values for testing the hypothesis of non time-varying coefficients (the first line gives the null 
hypothesis) 

A plot of the series with the final estimate of the intercept function is given Figure 2. 




Figure 2: Logged and differenced daily exchange rates between USD and Euro and estimation of the uncon¬ 
ditional variance 

5.3 A second example: the exchange rates between the US Dollar and the Indian Rupee 

In this subsection, we analyze the exchange rates series between the US Dollar and the Indian Rupee over 
the period starting from December 19, 2005 to February 18, 2015. The sample size is T = 2303. The 
information criterion selects p = I lag for this series. From the p-values reported in Table 2, the hypothesis 
of a constant intercept function is clearly rejected but it is not possible to reject the assumption of a non 
time-varying first lag coefficient. Fitting a sp(l) process gives a small but significant lag estimates (the p- 
value for testing the second order dynamic is less than 10“'^). In contrast, fitting a stationary ARCH process 
with one lag (one can simply use b = \ and our procedure) leads to ai - 0.3041 (s.e 0.0717) and several 
significant lag estimates are found for larger values of p . 


Non t-v ao 

Non t-v ai 

btv 

ai 

bsptv 

<kP 

0.4215 

0.035 

0.1527 (s.e 0.0688) 

0.028 


Table 2: Test and estimation for the USD/Rupee series (p-values for the test, the selected bandwidths for 
fitting a tv(l)/sptv(l) process and estimation of the first lag coefficients) 
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Figure 3: Logged and differenced daily exchange rates between the US Dollar and the Indian Rupee and 
estimation of the intercept function 

5.4 A classical stock market index: the FTSE 

Finally, we consider the closing values of the FTSE index from January 04, 2005 to March 04, 2015, taking 
as usual the logged and differenced daily returns. Our information criterion selects p - 5 lags. Testing 
constancy of the intercept function gives a p-value of 3 x 10“^ and the p-value for testing constant lag 
coefficients is 0.066. Hence, the assumption of constant lag coefficients is rejected at level a - 10% (testing 
constancy of the third and fourth coefficients gives the p-values 0.084 and 0.067 respectively, the other 
p-values exceed 10%) and considering a tv-ARCH process for this data set could be interesting. We also 
fit a sptv(5)-process. The estimated lag parameters and their standard errors are reported in Table 3. The 
p-value for testing the absence of second order dynamic is close to zero. 


a\ 

0-2 

da 

0.0547 (s.e 0.0321) 

0.II55 (s.e 0.0320) 

0.1204 (s.e 0.03II) 

^4 


bsp 

0.0942 (s.e 0.0367) 

0.I20I (s.e 0.0324) 

0.063 


Table 3: Estimated values of the lag coefficients and selected bandwidth for the ETSE 

Here selecting the number of lags is important because fitting a sptv(2)-process for instance does not 
give significant lag estimates and the selected bandwidth for p = 2 is very small. In Eryzlewicz et al. (2008), 
it is suggested that stationnary GARCH models give better forecasts for stock market indices than tv ARCH 
processes and that this result could be explained by a more stationarity behavior of these series with respect 
to currency exchange rates. This observation is compatible with our analyze of the ETSE index on this period 
of time which suggests that adding non linearity has a tendency to take away non stationarity, with larger 
selected bandwidths. However, Eigure 4 shows that incorporating a time-varying unconditional variance 
significantly reduces the values of ARCH parameters. In Eigure 4, two extreme cases are observed. When 
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b ^ 0, the sum of lag coefficients becomes arbitrary small whereas the value b = I (which corresponds 
to the fitting of a stationnary ARCH process) leads to larger lag estimates. Moreover, in fitting a sptv(5) 
process, the ratio ^lao{u)l&t has an average of 0.75 (s.e 0.14) which means that the contribution of the 
time-varying intercept has a strong contribution to volatility. 



Figure 4: Sum of the first five lag coefficienfs wifh respecf fo fhe value of fhe bandwidfh b (red dashed lines 
correspond fo fhe bandwidfh selecfed using our mefhod) 
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Supplementary material 

A Auxiliary results for the proofs 

In this subsection, we consider a general time-varying ARCH process {Xt)\<t<T defined by 


Xt = -x ^0 


' ,/=i 


X}_., p + \<t< T. 

^ J 


The different coefficients aj can be time-varying or not and we assume that assumption A1 is satisfied. Let 
us introduce additional notations. 

• We will always denote by || • || the euclidean norm on K.^ for an arbitrary positive integer g. The 
corresponding operator norm on Alg, the set of matrices of size g x g and with real coefficients, will 
be also denoted by || • ||. 

• If A is an integrable random variable taking values in Mg, we set A = A - E(A). 

• For a sequence b - bj ^ (0,1) of bandwidths, we recall the notation 



p + \ < i,t <T. 


O {{Tb) This bound will be extensively used in the sequel. 


Note that max k, i{b) = 
p+l<i,t<T ’ 


• For 1 < f < r, we set !Ff = cr : 5 < t). 

• Finally, we set Zf - \ for t e N. Then Z; is centered. 

• Important notations: for simplicity of notations, the quantities sj^b^t appearing in the statements of 
Theorems 1 and 3 will be simply denoted by S j^b,t for j - 1,2,3. 

We first give a lemma about the regularity of tv-ARCH processes. The following result is crucial for 
deriving asymptotic properties of our estimators and it is a direct consequence of Theorem 1 in Dahlhaus 
and Subba Rao (2006) (see also Subba Rao (2006), Theorem 2.1 and the discussion in Section 5.2). 

Lemma 1. 1. There exists a constant C > 0 such that for all (u, v, T) e [0,1]^ x N*, 


C 

T 



E|Ai(m) 2 - A^(v)| < C\u - v|, max E|a2 -Xf^- 


From this lemma, we get supy^^^j maxp+i^^^r EA^^ < -i-oo. 

In the sequel, we will use the following terminology. 

Definition 1. We will say that a sequence of functions fj : {1, ■ • ■ ^ T) x R+ —> R., T > p + 1, is in the class 
X if there exists two positive real numbers M and L, not depending on T, such that 


fr is bounded by M, max Ifrit, x) - frit,y)\ < L||x - y||. 
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Now, we will consider two particular classes of processes. 

Definition 2, 1. A process (}^()p+i<(< 7 ’ is said to be of type I if 

Y, = fT(t,Xl„...,xlp), p + l<t<T, 
where (/ 7 ')r>p+i is in the class £,. 

2. A process is said to be of type II if 

Yt = h (t, Xl,,..., Xlp) + Z,gT (t, Xl,,..., Xlp) , p + \<t<T, 
where (/r)r>p+i and (gr)T>p+i are both in the class £, and gj + 0. 

3. A process S defined by 

T 

p + i<f<r, 

/=p+l 

with Y of type I or II will be called a smoothing . 

Notes 

1. An important example of processes of type II is Yt = WtX^_jXj, for j e |[l,p]|. Here Wt is given by 
equation (2.3) in the paper. This is due to the decomposition T, = WtcrjX^_j + ZtWt(r^Xf_j and to the 
particular form of the weights Wt and of cr^. 

2. Some smoothings appear in the expression of qj^h,t for 7 = 1,2. Our method for proving Theorem 1 
is to make an asymptotic expansion of the estimator yS and to show that the effect of the smoothings 
incorporated in p is negligible by computing some moments. The terminology type I or type II is just 
used for identifying the number of smoothings which impose a moment restriction. 


A.l Covariance inequalities 

Here, we assume that the assumptions of Theorem 1 are fulfilled. Sometimes, assumption A2(h) can be 
used for a general value of the integer h, this will be precised in the statements of our Lemma/Propositions. 

Lemma 2. Let s and t be two natural integers such that T > t > s+1 > p + 1. Now let T^-i,..., 
be a random vector independent from the sequence {^t)i<t<T and with the same distribution as 
A^_i, ..., For s + \ <k < t, we define recursively 

rk = fk 



^2 _ y 2 | <2dc"^. 


where d - sup 7 ->p^j maxi^y^r ^Xf and c is defined in assumption 
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Proof of Lemma 2 


Assume first that s+\<t<s + p. Then 

p 

Z 

i=i 

< cmaxE|A,l..-T^ 

i<j<P 


E\xj-r^\ = ^aj(^)E\X^_j-rlj\ 


-J 


t-j' 


< 2dc. 


Since 


f-5-1 


< 1, the result follows in this case. 


Suppose the inequality true for any t e |[5 + p,NJ, where A € |[5 + p, T - 1]|. Then 


E|4,i - 


< 

c X max 1 


^<j<p 

< 

2d max c 


^<j<p 


N-s 

< 

2dc p . 


^N+l-j ’^N+l-j 


1 + 


A+l- 7 -.y-l 


Then the result of the lemma follows from a finite induction. □ 

Lemma 3. Let h, s, t be three integers such that p < s < t < T and h> Assume that < 00 with 

0 < 5 < 1. Let Us be an integrable random variable 'T's—measurable and G : —> R a bounded and 

Lipschitzian function. We set 

U, - Z,,,. • • • Z,,4G {Xlp, Xl^^„ Xl,) , 

with 0 < £ 1 ,... ,{o ^ k and o < h. Assume that E\UsZt+ei • • • < “■ Then, we have 

Kits) 

\Cov{Us,Ut)\<{Ciy C2)c—, 


where, setting k - 


Cl = M(G)c-" {E\UsZ,^e, • • • Zi+rJ + E|G,| • E|Z,+;, • • • Z,^eX ■ 


and 

p +1 

C2 = ^-^^L{GrM{Gy-^EX\UsZ,^t, ■ • 

I - CP 

where d is given in Lemma 2 and M{G) (resp. L{G)) denotes the supremum (resp. the Lipschitz constant) of 
the function G. 
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Proof of Lemma 3. 


• Assume first that t - p < s. Then, it is easy to get the bounds 

ICov {Us, Ut) I < M{G) mUsZt^t, ■ ■ ■ 4 + 4 I + El^.l • • • • 4 + 4 I) 

< C\c p . 


• Now, assume that t > s + \ + p. We have the equality Cov {Us, Ut) = E (17^ {Ut - U{)), where 

u't=Zt^tr--Zt^ioG(rlp,...,yl,) 

is a random variable independent from J^s- Here T denotes the process introduced in Lemma 2. Using 
Lemma 2, the following bounds are valid. 


< 

< 

< 


< 


< 


< 


|Cov(U„[/,)| 

E| UsZt^t, ■ ■ • • |G (Xlp,Xl,) - G (rip,’tl,) I 

{lM{G)f-^nUsZt^e, ■ ■ ■ Zt^,J ■ |G [xl^,.. .,xl,) - G (y^, ..., rl,) f 

k 

L{GY {2M{G)Y-^ Y, ^\GsZt^tt • • • Zf+4l • \Xl^ - Y^^^f 

i=-p 

k 

L{Gr {1M{G)Y-^ Y ^MUsZt^t, ■ --Zt^tf^^ X - Y^^,-! 


k 

1 1 r I je V ^ 5 1 

{2dfL{Gf{2M{G)Y-^BTT-s\UsZt^er--^t^io\ ^Y ^ ~ 

i=-p 


[ L{GrM{GY-^B.MUsZt^t, ■ • •Z.+^J 

1 - CP 




t-s 

p 


Then the result announced in Proposition 3 is a direct consequence of the two previous points. □ 


The following corollary is a direct consequence of Lemma 3. 

Corollary 2. Let h > \ be an integer. Assume that < 00 with 0 < d < 1. Let s,t,q,o four 

non-negative integers such that p < s < t and q + o <h. Let Us and Ut two random variables defined by 

Us=Zhr--Zh,H(xl..., X^) , Ut - Zt^,, ■ ■ ■ Zt.,e^G (x^,..., xl,), 


where H and G are two elements of£, and \ < hi < ■ ■ ■ <hq < s and 0 < di < • • • < do < L Then, we have 
the bound 

Cov{Us,Ut)<{CiVC 2 )c^'-f, 
where Ci - M{G)M{H)c-'^ (E|Zi|^+" + ElZil'^ • E|Zi|^) and 

d‘< 
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A.2 Moment bounds 

The two next results are crucial for proving the asymptotic normality of our estimators. In particular, Propo¬ 
sition 4 gives conditions under which some partial sums involving smoothings are convergent to zero with a 
faster rate than Vt. 

Lemma 4. Assume that < oo for a positive integer h. Let ..., be q > \ process of type I 

or II with at most h processes of type II. Then for a family {zT.i ■ p + ^ < i < T,T > p + \] of deterministic 
positive weights, we have 


Yj • ■ ■ 2r,'JE(F('^ • • • I - O ({fTST)i) , 

where st = ZjLp+i Ztj and fr = maXp+i<,-<7’Z7',;. 

s 

Proof of Lemma 4 We set jd = c'>“+■*>. The result is clear for ^ = 1. Assume that q > 2. First, we observe 
that from Corollary 2, we have for p -i- 1 < ti < • • • < < T and 1 < j < q - 

|Cov (Fj^^ • • • Y^f, Y^Y^ ■ ■ ■ Y^^^) I < (A. 1) 

where C > 0 does not depend on T and on t[,...,tq. Inequality (A.l) follows from the fact that the 
covariance given in (A.l) can be decomposed as a sum of covariances of the form Cov UtY given in 
Corollary 2 (replacing s and t with tj and ty+i respectively). Inequality (A.l) is crucial for the sequel. 

We set for 1 < j < q, 

(F(1), ..., F(^')) = 2] ZTM • ■ ■ zr.,-|E (f^') ■ • • F^^) |. 

p-Y\<i{<—<ij<T 

We use a classical method for bounding sums of cross moments using bounds on covariances (see Dedecker 
et al. (2007) p. 78). For a ^-uplet i = (/i,..., /^) € Up -t 1, r| such that /i < • • • < iq, we define 

5 (i) = min \j <q :i j+i -ij= max (k+i - k) \. 

{ \<t<q-\ ) 

Then, using the bound (A. 1), we have 

A^"^(FW,...,F(?)) 

J=t 

j=\ r=0 i:s{i)=j,ij+i=ij+r 

Since, 

Y^ ^T,ii ■ ■ ■ ZT,i^ < ‘5'r (r + 1)^, 

i:sii)=j,ij+i=ij+r 
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we conclude that 


..., A^f (?^^\..., ..., F^'?)) + O (sT{(pTf~^). 

J=1 

1 ^ 

Since (pj < sj, we have st{(Pt)'^~ < {sjcPt)^- Then using an induction on q, it easy to prove that 

This proves Lemma 2.u 

Proposition 4. Assume that < oo for a positive integer h and that b Vt ^ oo. Let F*^^\ ..., F^^^ 

be q > 2 processes of type I or II with at most h processes of type II. Yfe denote by the 

corresponding smoothings. 

1. Ifipt.T ■ p + < t < T,T > p + 1] denotes a family of real numbers such that 

sup max |//f 7 -|<oo, 
r>p+i 

we have, using the notations of point 2, 

t=p+i 

2. We have also Tjl=p+i = op(l)- 

3. If Y is a process of type I (resp. II), we have for all positive integer h' (resp. h = h'), maxi^f^p ES p' = 

o({Tbr'^'). 

4. Assume that F^^^ is a process of type I. Then Y YJt=p+\ F^'^ converges to 0 a.s. 


Proof of Proposition 4 

1. Assume that sup^y Iptjl ^ C. Taking the second order moment, we get 


^ I, vlT oC 

Tff L ■■■St 

'''' t=p+l 


(?)|2 


C 


i2 T 


^ ^ ks,itKMi ■ ■ ■ 


S,t=p+\ il,Ml,.:,iq,jq=P+t 

p(Ay(9) 

iq jq 


y(l)y(l) 
h Ml 


Using Lemma 2 (replacing h with 2h) with = 1 and fp = O ( 7 ^)^ the last bound is The 

result follows using the bandwidth assumptions. 
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2. The second order moment writes 


^ T T 

'r Z Z 




'ks,L_lktJ I ■ 


s,t=p+l = l 


' f n ji 


yiq) y(?) 

^q-\ jq-\ 


Using the bound maxp+i<( ,<r ,■ = o{^-^ and applying Lemma 2 with ZT,i - F it is easy to show 

that this second order moment is O This leads to the result.n 

3. This is a consequence of Lemma 2, using the inequality 

E{sf)< Y, ^mu--F,,*,|e(FvF-,,,)|. 


4. We have for e > 0, 



V *=p+i J 


t=p+l 

^ Z pUF"F'’)i 

Using Lemma 2, the last bound is O ( Then the result follows from the Borel-Cantelli Lemma. □ 

A.3 Control of deterministic quantities using local stationarity 

Lemma 5. 1. For u e [0,1], we set S 2 {u) - E{Wi(u)Mi{u)Ni{uy) and s${u) - E(1Fi(m)Mi(m)Mi(m)'). 

Then we have 

^ max^ {||X3,, - (^) II + ll«,, - « (^) ll) = o(iJ. (A.2) 


inf det(53(M)) > 0, (A.3) 

ME [ 0 , 1 ] 


sup ||^ 2 (m)II < (A.4) 

ME [ 0 , 1 ] 


max \\q 2 ,t-q 2 
p+l<t<T 




(A.5) 
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2 . 


'We have 


sup max {||E-'(53,OII + I|E(Si,0II + ||E(52,0I||<“- 


3. Setting rjj^ = E - qj^tfar j = 1,2, we have 


max 

p+l<t<T 


|l|77i,rll + \\q2,t\\} 


0{b). 


4. We have 


max 

p+i<?<r 


{l|53,rll + l|5;j||) = OKl)- 


(A.6) 


Proof of Lemma 5 

1. We prove the four assertions successively. 

• Since WtMfM' = f (jlT,Xj_^, ... where / : [0,1] x R+ ^ satisfies 

p 

\\f{u,xi,...,Xp)- f{u,yi,...,yp)\\ < C^|x,-y,| 

;=i 

for some positive constant C, Lemma 1 given in the paper yields to maxp+i^y^r || 53 _f - || = 

0(1/7). The conclusion for ^ 2 ,? follows in the same way. This shows (A.2). 

• Next, we show (A.3). Let A{u) be the smallest eigenvalue of E(1 Ti(m)Mi(m)Mi(m)'). From 
Lemma 1, the application u i-> E(1 Ti(m)Mi(m)Mi(m)') is Lipschitz continuous. Moreover, it is 
easily shown that for all u € [0,1], the matrix E(Wi(m)Mi(m)Mi(m)') is positive definite. This 
entails that the application u i-> A{u) is continuous and positive. This implies (A.3). 

• Since sup^g^Q jj || 1 Ti(m)Mi(m)Ai(m)'|| is bounded, we deduce from (A.3) that sup^^g^Q ||<?2(m)II < 

oo. 

• The assertion (A.5) easily follows from (A.2), (A.3) and (A.4). 


2. Since 


we have 


C - sup Wi{co)\\Mi{oj)\\ < oo, 
oj,TJ 


||E(Si,,)|| < Yj ^aIIe(w,-m,a2)|| 

i=p+\ 


< C sup max e(x?) . 
r>p+ii<f<r ^ ^ 


The same kind of inequality holds for ||E(S 2 ,f) ||. 
It remains to prove that 

sup max ||E“^(5'3_,)|| < oo. 

T>p+l 


If X € Rj, with ||x|| = 1, we have using (A.2), 

/ / / ^ 
inf xs 3 tX> inf xs 3 (u)x-—, 
p+l<t<T ’ w6[0,l] T 


(A.7) 
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for a suitable constant C > 0. Then, using (A.3), there exists T > 0 such that x'E(5'3,()jc > A - 
We deduce that if T is large enough, the smallest eigenvalue of E(5' is bounded from below. This 
means that there exists an integer Tq > p + I such that 

sup max ||E“^(S 3 ,f)|| < oo. 

T>ToP+^^‘^'^ 

But since each of the matrices E(5' 3 ,f) is easily shown to be positive definite for p + 1 < t < T and 
T > Tq, (A.7) easily follows. 

3. For a Lipschitzian function / defined over [0,1], fhe assumptions made on fhe kernel K implies fhaf 

T 

max \fitlT) - y kt,if{ilT)\ = 0{b). 

l<t<T ^ 

i=\ 

We only prove fhat maxp+i^f^r ||? 7 i,r|| = 0{b), fhe proof for 772 ,? is similar. We use fhe decomposition 
riu = s-^]{s^,t-m3,t))W\S^,)su + W\S^,t){HSu)-su)- (A.8) 

From fhe proof of fhe fwo firsf poinfs of fhe presenf Lemma, if is easily seen fhaf 

sup max {||E“^(S 3 ,,)|| + ||5i,,|| + ||5;5||} < 00 . (A.9) 


Moreover 


max iU3.7-Efwi(t/r)Mi(t/r)M;(t/r)l|| = 0 (i/r). 

p+\<t<T L i J 

Since = 'E{WtMtcr^'^, fhe choice of fhe weighfs Wt enfails also 

^max^||^i,,-E[Wi(t/r)Mi(t/r)A2(t/r)]|| - 0(l/r). 


Now, since fhe fwo applicafions 

u ^ d{u) - e(Wi(m)Mi(m)x2(m)) u ^ e{u) - E(Wi(m)Mi(m)Mi(m)') 
are Lipschifz continuous, we gel 

max |||E(Si,,) - + ||E(S 3 ,r) - ^ 3 , 7 ll) - 0(b). 

p+\<t<T 

Then, fhe resulf announced follows easily from (A.8), (A.9) and (A. 10). 

4. We use fhe decomposition - S + E(5' 3 ,f). From fhe previous poinfs, we have 

max {||E(S3,7)|| + ||E-1(S3,7)||) - 0(1). 


(A. 10) 


Moreover for 6 > 0, we have using poinf 3 of Proposition 3, 

i ^ 

, . 153.711 > 6 ) < 4 

\p+l<t<T 


max ||53.f||>e') < \ VE||53,r|| 


< 


t=p+l 

c 


e^Tb^' 


Then we conclude fhat maxp+i^f^r ||53,(|| = op(l). Then (A.6) easily follows.□ 
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B Proof of Theorem 1 


Setting 


P = Dj^Lt, Lt= Yj ^tOtVu Dt= Yj Wt 6 t 6 [, 

t=p+\ t=p+l 


we have p - p - Dj.^ Z^p+i WtOt (Vt - Using the two relations Vt = Vt - M' (qi t - qi,t), 
Ot - {q 2 ,t - q 2 ,t)' Mt, we obtain 


Ot = 


T 

^-p = D-^ IV, (O, - {q 2 j - q 2 ,t)'M,) ■ _ i ),^2 _ ^ ^ 

t=p+\ 

We also set Ut = WtOtM[. Observe that E17, = 0. This yields to the following decomposition. 

P - P = Dj^ {Li tP - LjjP - Ljj + L4J + L5 T - L^j), 


where 


T T 

^ Utiq2,t - q2,t) > ^ 2 , 7 ’ - ^ {q2,t ~ q2,t)' m , m ' (§ 2 ,? - q2,t), 

t=p+i t=p+\ 

T T 

Lsj = {qi 4 - , L 4 J = ^Wt iq 2 ,t - q 2 ,t)' MtM't {qi^ - qi^ , 

t=p+i t=p+\ 


T T 

L 5 J= Yj ^tOtZtO-l Le.T= Y Wt{q2,t-q2,t)' MtZ,al 

f=p+i f=p+i 

L^j is the main term in the asymptotic expansion of the numerator Lp. We will use the formula 

- b-^a = b-\A -a)- b~\B - b)b-\A -a)- b-^{B - b)b-^a + B~\B - b)b-\B - b)b~^A. (B.l) 
Now for 7 = 1,2,3, we set S jj = S - E Using (B. 1), we have for j - 1,2, 


qjp - qj,, = E-i (S3,r)E(Sy,,) - qp + SjjSa.rE-' (Sj.f) (SsPSp 

+ E-i {SsPSjj - E-' (S3.,)S3 ,,E-i (SsPSjj - E"' (S3,,)S3,,E-i (S3,,)E(s7,) 

(B.2) 

To prove Theorem 1, we will prove that 

-Ylsj ^ Nn (0,^2), (B.3) 

Vr 


-^Ljj = opW, 7 €{1,2,3,4,61, (B.4) 

Vf 


lim —Dp = El, a.s.. (B.5) 

r—»oo T 

The proofs of (B.3), (B.4) and (B.5) are established in the following subsections. 
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Proof of assertion (B.3) To prove (B.3), we use the central limit theorem for triangular arrays of mar¬ 
tingale differences (see Pollard (1984) Chapter VIII.l, Theorem 1). Using the Cramer-Wold device, it is 
enough to prove that 


1 

Vf 



Wtcr^x OtxZt 


-S M (0, x'L2x) , 


(B. 6 ) 


for each vector v € R". We set 

A, - Wtojx'OtxZt, At = Wtcrjx' (iV, - <72 M,) vZ,. 

Then, if 'Ft - o-{^s ■ ^ ^ 0, the two families {{At, Ft) '■ f < t < T} and j^A,, ;Fr) : 1 < f < r| form a martin¬ 
gale difference. Their corresponding partial sums are asymptotically equivalent because the quantity q 2 ,t is 
simply replaced by q 2 {tlT) in the expression of A,. Indeed, we have 

- A,I < ||x||2 . IZtl • 11^2,^ - 92 (t/r) II • \\MtWtO-% 

Using the fact that Mt Wt(rj is bounded uniformly in t and Lemma 5, 1., we deduce that {^t-^t) 0, 

in probability. As a consequence, it is sufficient to prove (B. 6 ) for A; instead of A,. Moreover, 

E(A?|r^-i) = ml - IP • x'Wja^t - 92 - 92 (^)' x. 

The process G defined by Gt = [Nt - 92 (f) Mtj (^Nt - q 2 (f) Aftj is (coordinatewise) a process 

of type / and Proposition 4 (point 4) leads to 

lim 7 ’_^oo f FjJLp+i Gt = 0 in probability. Moreover, using Lemma 1, Lemma 5 (A.3) and some Lipschitz 
properties, one can show that 


lim 

T --*cc 


T 

z 

t=p+\ 


ml 


11 ^ 


■ E(Gf) - lim - 

i —>00 i 


z 

f=p+l 




ipE(Gi(t/r)) -ui;2x. 


Then we get ^ YJt=p+i ® (a? |!Ti-i) —» x''L 2 X, in probability. Next, we check the Lindberg condition. If e > 0, 
we have 


I.,vf11 

We easily deduce that h 'lJi=p+i^{M^\A,\>^mi) 


2+5^^2+5^4+25|^, (a, - 92 

0 in probability. This proves (B. 6 ) and (B.3) follows. 


Proof of -^Njj = op(l) for 7 € {1,3, 6 } . 

We only prove the result for j = 3. The two other cases can be treated in the same way. We set 
qi t = E“^(5'3,()E(5' i,f) - qij € Afyj. The proof of the result follows from the following points. 

1. We first prove that Z^p+i ^r9i,f - op(l). It is enough to prove that for (>v,y) € P,n]| x |[1, /n]. 


1 



T 


z 

t=p+l 


Ut{w,y)q\,t{y, 1) = op(l). 


But this assertion follows from Lemma 5 (3.) and Lemma 4, since Ez^ can be bounded by (up to a 
positive constant). 


33 



2. Now we prove that 


1 ^ 

— 2] t/,E-i(53,,)Si,,-op(l). 


(B.7) 


f=p+i 

In order to prove (B.7), it is enough to prove that for a given vector {w,y,z) e |[1, n] x x |[1, m]|, 

T 


^ f/,(w, 3 ;)E-'(S 3 ,f)(j,z)Si,,(z,l)-op(l). 

t=p+\ 


But the result is a direct consequence of Lemma 5 (point 2) and Proposition 4 (point 2 applied with 
h = 1). 

3. Using the same arguments as for point 2, one can easily show that 

T T 

UtE-\S^,t)SytW\Sxt)Su - op( Vr), 2 ] UtE-\S^,,)S3,tW\S3,,)-E{Su) = op( Vr). 

/=p+l t=p+\ 


4. Finally, we prove that 

T 


Y UtS-^lSxt^-\S^,t)S^,t'E-\Sxt)Su 

t=p+\ 


Since Ut is uniformly bounded in a», t, T, there exists C > 0 such that 
1 r _ 

II— 2] UtS-^,S^,W\S^,)SytW\S^,)Su\\ 

VJ t=p+\ 

T 

< max \\S^]\\- Y ll‘53,fll'-||5ull 


(B.8) 


< c mg, iiijjii. ‘ x; iisrf■, i; 

‘ y t=p+i V=p+i 


l|5l,||2. 


From Proposition 4, we have the bounds 


max E||5'3<|r = C? 
p+l<t<T 


1 


max EyS 


T^b^j' p+i<t<T 


.,,lP = o(L). 


Then using the point 2 of Lemma 5, we conclude that ^ YJt=p+\ = Op(l)- Finally, using Lemma 

5 (4.), we conclude that 

11^ y U,S3;S3.fE-i(S3,^)S3,^E-i(S3,,)Si,,|| = (9p(^). 

vvfi,/ 


Hence, (B.8) follows using the assumption b Vt 
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Proof of -^Njj = op(l) for j € {2,4} . 

We only prove the result for j = 4, the proof for the case 7 = 2 is similar. Here, we only use the basic 
decompositions 


qk,t - qk,t = rik,t + E ^( 5 3,f)S k,t - S 3 )53,fE '( 5 3^)8k,t, 

fork= 1,2 and with 

ik,t = ^~\s 3,f)E (s k,t) - qk,t- 

Then we have 

T 

l|A^4,rll < 2] Wqi,, - qiA • \WtM,J',\\ ■ Wq,, - q^W 

t=p^\ 

T 

- T ^ {ll?2,f - ^2,(11^ + \\qi,t - quf], 

^ t=p+\ 

where Ci = sup^ j ^\\WtMtM[\\. It remains to prove that fork = 1,2, 


(B.9) 


1 


T 


z 

t=p+l 


\\qk,t - qk,t\f = op(i)- 


(B.IO) 


We only prove (B. 10) for ^ = 1, the proof for ^ = 2 being the same. The proof easily follows from the three 
following points. 


1. From Lemma 5 (3.) and our bandwidth condition, we have TjJ=p+i - o(l). 

2. Since 

C 2 - sup ||E-'(S3,0II 
T>p+l 
p+l<t<T 

is finite using Lemma 5 (point 2), we use the inequality 


1 

Vf 


T 

2 ] ||E-i(S3,f)5i,,||2 

t=p+l 



t=p+\ 


But we know from Proposition 4 that maXp+i^/^T-E^yS Then condition b^/T 

entails that 


1 

Vf 


T 

Y e(||SiJ 2 ) = o( 1 ). 

t=p+i 


This shows that ^ lj=p+i l|E \S 3 PS = op(l). 


00 


3. Finally, we show that 


1 


T 

Y ll53“jS3,,m,-i5i,,||2 

t=p+\ 


- op(l)- 


(B.ll) 
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We have ||5'3 j5'3,iE H5'3,r)5'i,/|P < maXp+i^i^r ||5'3 j|p||5'3,,|p||5'i,(|p, where C 2 > 0 is defined in 

the previous point. We have maxp+i^/^r ll^ = Op(l) (see Lemma 5, point 4) and 

^\S3,,f • IIS 1,,||") < Eifs (||S3,^11"^) • Et^ (\\S . 

Without loss of generality, one can assume that ^ is an integer. Using Proposition 4 (point 3), we 
have maXp+i<f< 7 'E^||S 3 ,,|p^^ = 0^{Tb)~^^. Moreover, using convexity, the moment assumption 
on the noise and the fact that WiMicr^ is bounded, 

max E||5i,||^(^+^^ < max = 0(1). 

p+l<t<T ' p+\<i<T 

Then assertion (B.ll) easily follows from the condition ^^Tb 00 . 

B.l Proof of assertion B.5 

Recalling that Ot - Of - {q 2 ,t - qij)' Mt, we have 
T T 

Dt = ^ ^ w, 0 , 0 ' + ^ {h,t - qi,)' MtM' (§ 2 ,, - q2,t) 

f=p+l t=p+\ 

T T 

- ^ 2 {q2,t - qi,)' M, 0 [ - ^ {q2,t - q2,t) ■ 

t=p+l t=p+l 

We have already shown that the three last terms in the previous decomposition are op(l). Then, it remains 
to show that 

lim - y WMO; = Xi, (B.12) 

T —*00 I f ^ 

?=p+l 

in probability. One can obtain (B.12) using the same arguments as for deriving the limit of 

2 E(A?r,-,) 

/=p+i 

in the proof of assertion (B.3), .□ 

C Auxiliary results for the proof of Theorem 2 

Corollary 3. Let (Tf)i<f<r be a process of type II. We denote by {St)\<t<T the corresponding smoothing. 
Then, under the assumptions of Theorem 2, we have 

max |5,| = op(r“?V 
\<t<T \ / 

Consequently, if for / e {1,2}, is a smoothing then 

max - op[^]. 

i<ti,t2<T ‘ ^ IVt/ 
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Proof of Corollary 3 We set j - 2 j + 1^. Then using the point 4 of Proposition 3, we have 


max |S r| > eT 4 

p+l<t<T 


< 


j T 

CT4 


Y, 'I' 


t=p+i 


< c- 


r? 


(Tby^ 


where C > 0 denotes a generic constant and e > 0 is arbitrary. Since, 


yl+i 

iTb)i 


< 




, the result follows 


from the assumptions made on b.a 

Finally, we state a Lemma which will be useful for the proof of Theorem 2. For a matrix-valued process 

measurable with the sigma field Tt, we will use the equality Ht = op j when m&Xp+\<t<T ||//(|| = 
We also introduce additional notations. For j = 1,2,3, we define S*^ as S replacing Wt with 
Lemma 6. Assume that the assumptions of Theorem 2 hold. 

1. For j = 1,2, we have 

qjj - qj,t - qj,t + E-\S3,t)Sj,t - E-\S3j)S3,tE-\S3,tmS jj) + op 

and 


2. We have 


with 


\\qj,t-qi,t\\ -op(^|. 


i 11 - + o; (^ - /J))+op 


0-, 


crt 


Ft = q\j - qi,t - {qi,t - q2,t)p. 

In particular, we have W* - ^ (1 + Et) with = op j. 

3. We have maxp+i<,<r HSjjH = Op(l). 

4. For j - 1,2,3, there exists two matrices ap and cp such that 




where 


Rp = ap (P-I3)+Y ^FCj.iFi = op ■ 


i=p+\ 


(C.l) 


Moreover, iP ^ op 
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5. For j = 1,2, we have q*.^- q*j^ = F*^ + Ap< where 

F*., = q*.^ + E-\Sl^)S^j, - E-\Sl)S^3,tE-\Sl)E(s*J} , 
Aj, - E-\syRjj - E-\syR3,tE-\Sl ,)E{s*,) + op . 
Moreover, \\q*^^ - q*J\\^ = op 


Proof of Lemma 6 


1. Under the assumptions of Theorem 2, we recall that St = Of{T for all univariate smoothing (see 

Corollary 3). Then, applying this property coordinatewise, we have \\S jj\\ = op for j = 1,2,3. 

The announced decomposition is then a consequence of decomposition (6.4) given in the paper and 
of Lemma 5, point 4. Next, the second assertion follows from the condition Vt ^ 0 and Lemma 
5 (point 3). 


2. We use the decomposition 


w; 


0-7 


1 + 


0-7 


0-7 


Vj 


0-7 


cr4(d-f + yr) j 


(C.2) 


Now, using the fact that Lj + \\p - /3|p - op (we get 


a-^ - cr'f + — (o', (p - + M[L^ + 


From this decomposition, we deduce that ^ 3 ^ 


= Op(l). Then we get 


w; 




0;(yS-/3) + M;Lf) + op 



which also yields to the approximation IT* = ^t) with - op 

3. We have 


^ MM 

i=p+l 




= si, + of{iy 


Indeed, maxp+i<,<r is bounded uniformly in (co, T) and from point 2, maxp+i^f^r HFfH = op(l). 

Then the result follows from the fact that maxp+i^f^j- IKS^ ^)“^|| = 6 )p(l). 
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4. We only consider the case 7 = 1, the cases j - 2 or j - 3 being similar. Using the point 2, we have 
the decomposition 




0'.(p-l3) + M'.Li 


i=p+\ 


cry 


MiX- + Op 


(*l 


Now we set ai,f = -2'Lj=p+\ |j and cy; = -'^Tj=p+i j. Moreover, we have 


from Lemma 3 


which leads to 


-2 i; k 

i=p+l 


-ai, - opir 4 


(rf 


-2 2 


MiN'.Xf 

i g 


i=p+l 




Then it remains to show that 


Z k, 

i=p+\ 


MMX} 


(rf 


- ci,i 




(C.3) 


Considering the decomposition given in point 1, assertion (C.3) will follow if we show the two follow¬ 
ing assertions. For all real-valued sequences (c?) such that maXp+i^j^T- Ct = 0{b) and all real-valued 
processes (Tf), (G?) of type / or II, 


Z ^kiCiyi = oJ^ 


i=p+\ 


y/f 


(C.4) 


and 


2 

i=p+\ '■ ’4 

where 

T 

St=Yj Ki(Gi-E{Gi)). 

i=p-¥\ 

The assertion (C.5) can be proved as follows. For e > 0 and an even integer /i > 0, we have 


(C.5) 


max 

p+\<t<T 


i=p+\ 


> e 


Vr 


T 

T2 


2] E| 2] ktjYiSi 


t=p+\ i=p+\ 


T2 


? + l 


Z P‘iPj' ■ ■ ■ Pii^PJh\'^ [YhGj, ■ ■ ■ Yi,Gj,) I, 
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where pi - maxp+i<,< 7 ’ kfj. Using Lemma 4, we deduce that the right hand side of the previous in¬ 
equality is O —j But I ~ 1 - I (l ~ I) ^ I (1 ~ when h > ^. Using the bandwidth conditions, 

we get (C.5). Next, using Corollary 3 and the bandwidth conditions, we get 

T 

max I } ktiCiSi\< max |cj| x max \St\ = ov\bT~^\ = Of 
p+\<t<T ’ p+\<t<T p+i<t<T \ / 

i=p+l 

This proves (C.4). Finally, since ||L,|p = op j andyS-yS = Op the decomposition (C.l) holds 
true. Using some arguments discussed before, we easily get ||5* ^ |p = op 

5. Using the equality 

+ {^h - 

the result of point 5 is an easy consequence of the previous points. □ 



D Proof of Theorem 2 

We recall that for a triangular array {//, = Htj} of matrices, we will denote //, = op (j when 


max ||//f|| = op(^|. 
p+i<t<T \ Vr 


Notations. Let us also recall the following notations. 


t=p+l i t=p+l 


MiN'. 
i ~ ■ 
O"? 


^ MM 

crT 




t=p+\ 

ril,-W\SlM(siyql„ 


cr. 


crl 


- 


jMtM'A iMtN' 

1 ' ' TQ ‘ f 


err 


cr; 


OMNt-(qy M,. 

The proof of Theorem 2 uses the same decomposition as for Theorem 1. More precisely, we have 

^ = bj^ {i-ijP - i-2jP - L-ij + L^j + L^j - , 
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where 


A,T - 


L2,T = 


2 W:o:M'(ql-ql), 

=p+l 

T 


t=p+l 

T 


Uj = Y 

t=p+\ 

T 

r=p+l 

T 

Uj = Y "^fO^Ztcrl 


t=p+\ 

T 


- 6 ,r - 


2 w:(r2,-qh)' M.Z^ctI 

t=p+l 


Dt= Y + ^2,r- 

t=p+l 

Using Lemma 6, it is easy to get 

ll^2,rll + ll^4,rll - Op (Vr^. 

Then the proof of Theorem 2 will follow from the three following points. 

1. We first show that Ljj = op ( Vt^ for j = 1,3,6. We only consider the case j = 1, the proofs for the 
cases j = 3, or 7 = 6 being similar. Using the notations introduced in Lemma 6 (points 2 and 5), we 


have 


Lij- Y 

t=p+i 


o;m' 


{n, +a 2 .) . 


The proof will be a consequence of the following points. 

0*M' IT’ O* M' 

• Since is centered, we get Z(=p+i ~^^ 2 t ~ '2p(l) The proof is similar to the proof 


given in Theorem 1 (see the proof of -^Njj - op(l) for j - 1,3,6). 


1 T 0*M' 

Next, we prove that Zf=„+i - op(l). First we will show that 


Vf ^t=p+l cr* 


1 ^ o*m; , ^ 
vf 2] ^E-'(5= »,( 1 ). 

’2 t=p+\ 


Using Lemma 6, we have the expression 


T 

^2,t ~ ^2,t{p ~ ^ J ^t,iC2jLi- 

i=p+\ 
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Since is centered, we have 


1 W*M' , ..V 

-J= 2j ^T^E-i(S;>24/3-yS) = op(l). 


Vf cr; 


=p+i 


Then, it remains to show that 
T 


1 V—! OtMf I V—! 


Vr 


(D.l) 


i=p+l 


The assertion (D. 1) will follow if we work with each entry of different matrices and if we prove 
the following property. If (T,) is a real-valued centered process of type I or II, (s is a real-valued 
smoothing of type I or II and (c,), (ct) are deterministic sequences satisfying maxp+i^^^j- \ct\ - 
0(b) and maxp+i^j^r \ct\ = 0(1), then 


T T T 

j= Y,ct = op(l), ^ Z Z 


Vf 


(D.2) 


t=p+i 


The first assertion in (D.l) has been already proved using the bound for the covariance function 
of the process (Tf) (see the proof of -^L^j - op(l) in the proof of Theorem 1). Then it remains 

to prove the second assertion in (D.2). Writing S,- = Zj=p+i ^ij^'j< we have 


:p+l i=p+l 


Vf ^ 

^ t,l= 


J'=p +1 


with pt „/ = lJi-p.i Cikt^ikjj satisfies maxp+i<j_,< 7 - |p,j| = Moreover, using Lemma 2, we 

have for a generic constant C > 0, 

T T 

fj=p+i n,f 2 jij 2 =p+i 

T 


c 


< 


C 

Th^' 




n,f 2 ji J 2 =p+i 


Using the assumption Tb^ —> oo, we deduce the second assertion in (D.2). This proves (D.l). 
Finally, there exists a constant C > 0 such that 

1 ^ 0*M' , . 1 ^ \\0*M[\\ 

7^ Z ^ II ^ -77^ Z I^'l"" II^T. - ^1^11- 


Vf t=p+\ 

Then using Lemma 6, we deduce that 


Vf <^7 


p+l<t<T 


p+l<t<T 


1 ^ 0*M' , . 

-i= Z = 

t=p+i 
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2. Next, we prove the following convergence in distribution: -^L^j N„ (O, Var We have 



= Aj + Bj. 


The convergence of {At) is obtained as in the proof of Theorem 1. Moreover, using the expression 
of Et given in Lemma 6 and arguments which are now familiar, the convergence Bj = op(l) can be 
obtained. 

3. Finally we show that ^ > E a.s. It just remains to prove that 


1 

T 



1 

f 


T 


z 

t=p+l 


o; (o;y 


0-7 


E. 


Using the fact that maxp+i^f^j- HFfH = op(l) and % is uniformly bounded in t, T, oj, the first assertion 
follows. The second assertion has been already shown in the proof of Theorem 1 with general weights 
W,n 


E Proof of Theorem 3 


The proof uses the arguments given in Fryzlewicz et al. (2008) (see Proposition 3 of that paper). We have 
the decomposition 


at - a{u) - -q2,b',t (p - p) + (Af(u) - A*{u)) + S 3 j,,/lf(n). 
Note that 


(E.l) 


A%u) - 2] Wt,i{b')Wi{u)Mi{u)crj{u)Zi 

i=p+l 

and using the central limit theorem for martingale differences, we obfain as in Fryzlewicz ef al. (2008), 
'Jfb'A^iu) Afy^O, Var(^ 2 ) • JK{xfdx • e(Wi(m)o-i(m)'*Mi(m)Mi(m)')J . 

From Theorem 1, we have yg = Op | j. Moreover, using Lemma 5 (poinf 4), and fhe facf thaf S 2 ,b’,t 


uniformly bounded in t, T, tu, we gel 


IS 


\\q 2 ,b’,t\\< max ||S 3 J,,J|- max ||S 2 ,fo',f|| = Op(l). 
p+\<t<T ’ p+l<t<T 


This leads lo 


q2,b‘ 




- Op 




(E.2) 
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Moreover we have 


lim S-i^b',t = E(Wi(m)Mi(m)Mi(m)') a.s. 


(E.3) 


To justify (E.3), it is sufficient to show that for every real-valued process {Yt)p+\<t<T of the form F, = 
with/bounded andEipschitz, we have ktj{b')Yi —> E(Fi(m)). Using Lemma 1, 

the Eipschitz property of / and the properties of the kernel K, it is easy to get 




vergence (E.3). 

The convergence in distribution announced in Theorem 3 now easily follows from (E.2), (E.3) and 
decomposition (E.!).□ 

F Proof of Theorem 4 

In this proof, we will set 




^ X\at, dji{u) = Xi{u)'at. 


As in the proof of Theorem 3, we have the decomposition 



• We first show that ^ probability. We set f_ = max(l,f - Tb') and U 

min(r, t + Tb'). Define the following event 



Note that we have. 



and is a random variable bounded by a constant C > 0. Then we have the inclusion 
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But 


max lla; - a,|| < \\at - a{u)\\ + max \\a{u) - a,|| < Ci 

t-<i<t+ 

for a suitable constant Ci > 0 and from Theorem 1 and Theorem 3, ||df - a(u)\\ = 0^\b' + 

This yields to maxj <,<(+ \\at - a,|| = Op {b' + Then we conclude that 0. 

event Atj, we have 


Wat - a{u)W +b' + - 


V^} 

On the 




< 




But since -y - 1 = ^ (dj - a,), we conclude that 

(T^ (T- 

% = ^i^+Ftj), 

with maxj_<,<(+ \Ft,i\ - op(l). Using the properties of the kernel K, this yields to 


h,b',t = > , —— + op(l). 

(T 

i=p-¥\ i 

Using the arguments used in the proof of Theorem 3, we have YjJ=p+i ^ 

probability and the last expectation is also the limit of s^^b',t- 


• Next we show the convergence in distribution 

T 

Yj ktj{b')%{u)Mi{u)(ri{ufZi^N,niO,A^*{u)). 

i=p+l 

As shown in the proof of Theorem 3, this convergence holds if W*^(u) is replaced with in the 
last expression and it remains to show that 


T 

X kt,i{b')[%{u)--^ 


i=p+\ 


o-iiuf 


Mi{u)cri{uy'Zi 0 


in probability. We will use the following equality 


to 


WUu) 


=z 


{o-iiuf - - /upf {o-iiuf - d-piiuf - fir) 


^0+1 


^ cr,-(M)4^+4 


CT; 




First, it is easy to show that for ^ = 1,..., 
T 


I - , {o-iiuf - d-t,iiuf - /ir) 2 / 1 \ 

ffb' y kpiib'y- -- ’-Miiu)criiuf(y^ - l) - op(l). 

i=p+\ 


(F.l) 


(F2) 
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Indeed, using the equality 


a-Uu) 


^i{u)Xi{u)' „Xi(uy 

- 1 ^ (flf - ai) -—— {at - at) + 2 {at - ad , 

(Ti{ur o-i{uy 


o-i{u) 


developing ( 


\a-i(ud 


1 ) and using the decomposition at - a,- = at - a{u) + a{u) - a, it is clear that 
^ (cr,•(«)'* - &t,i{^f)^ 


2 Ki{b')- 

i=p+\ 


o-i{uy 


4U4 


-Mi{u)cri{u) Zi 


is composed of terms which write as a product of factors involving some coordinates of the vector 
at-at, ht and of one factor of type X ]=p+i kt,i{b')Ou,t,i (ff - l) where Ou. xj is bounded and measurable 

w.r.t (T{^i-s : 5 > 1). Such a term is op | because at - a{u) = op(l) and since {OuxjZdp^^^-^j. is 
a sequence of martingale differences, we have 

T 


y ktx{b')Ou,t,iZi^oJ-^. 


This proves (F.2). 

Now we consider the remainder term. Recall that maxj </<,+ \\at - a,|| = Op \b' + -^=). Using the 
assumption on b', this entails 


max 1 - 


&t,i{uf Ht 


t-<i<t+ \ (Ti{uY 0-i{u)^ 
If we define 


^0 + 1 


- Op 




(F.3) 


Atj{u) = \ dj.{u) > -<Ti{uf }>, 


we have V{Atj{uy) ^r^oo 0 (the proof is the same as for Moreover we have, using (F.3), 

\fo + l 




^ ktj{b')Mi{u)(r^{u) 

i=p+l 


< 4 max 

t-<i<t+ 


1 - 


d-t,i{^f 


(cr,(M)"^ - d-t,i{u)^ - larf 
o-i{uf^^+‘^&tx{uf 

||M,-(n)||l ^ 


(Ti{uf 0-i{ur 


o-i{uy 


■ Y, ktj{b'M^ - 1| 


i=p+\ 


- 


1 


^IfF 

Then, we conclude that 


X kt,i{b') 

i=p+l 


, (cr4(M)-d-4.(M)-/l7’) 


4 + 1 


o-f 




-Mi{u)(Ti{u) Zi ^ Op 


(Vw)' 


This shows (F.l). 
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Finally, we show that Bt{u) - Bf{u) = Ofib'). We eonsider the event Atj = Atj n Atj{u). We have 
of eourse lim 7 -_,oo P ((Aj will just show that 


Ht = h,b',t - Yj kt,i{b')%{u)Mi{u)Xi{uf = Of{b'), 

r=p+l 

the eontrol of the differenee being the same for the two other quantities S 2 ,b',t and S 2 ,b',t- Sinee 
P (A(,r) ^ 0, we have = Op(^')- Now, on the event Atj, we have 


7 * w* 


where 


t{u)Mi{u)Xf{u) - 




^^ti+MT 

F, = - 


Xiiuf 



+ Bt 

Gi = - 

Mi 

Miiu) 




Xiiuf 


--Gi, 


+ Bt 


On Atj we have for a suitable eonstant C > 0, 

IP’,I 


\xl-Xi{u)\ ^2 

~ crj crj ' crj + ajiu) 


< C 


Moreover, 1-. 


A.T 1/^4 

eonstant D > 0, 


\xf - Xiiufl + \\atf^^ Y 

e=\ 

, is bounded uniformly in t, i, T, co. Then, we obtain for a suitably ehosen 




Mi 


Y . _- 

‘ ^ / /s 4 

,=p+l ^O-^.+liT 


P T 


< D 2] ktj{b')\xf-Xi{uf\ + \\at\fY Y 

1 /=/?+! £=1 i=p+l 


Using the faet that ||df|| = Op(l), the support eondition on the kernel K and Lemma 1, we eonelude 
that 

T ^ 

Y ^M'(P')-7==P’,- = Op(P'). 

,=P+1 yjd-^j+fiT 
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Similarly, we also get 


, Xiiuf 

Ij,, L ) , =Gi = Odb'). 

i=p+l ^&'l.{u)+flT 

This proves that Ht^x,T ~ and then Ht - O^ib'). The proof is now complete.□ 

G Auxiliary results for the proof of Theorem 5 

In this section, we assume that the assumptions of Theorem 5 hold true. The quantities studied in the 
following lemma are defined in (H.l). 

Lemma 7. We set 3 = [0, U [1 - b, 1]. 

1. We have max„g[o,i] ||E(B„)|| = 0{b) and for all positive integer q, max„g[oj] E = O 

2. max„g[o,i]E(||A„||4) = o((rf.)-2). 

3. We have for all positive integer q, 

maxE||S„ - - 0(1), max E||S„ - Kuf^ = oib^ + {TbT^^) . 

ueS ue[b,\-b] '' ' 


4. WesetRu = RuS-J. Wh/rave max„g[o,i] ||S-'|| = Op(l), max„g[i,,i_fo] e(||/?„||2'?) - o(b^‘^ + {Tby^'i) 

a/rdmax„gsE(||/?„||2^) ^ 0(1). 

Proof of Lemma 7 

1. The first assertion is obvious. For the second assertion, we use Lemma 2. It is sufficient to show that 
for some subscripts y, w, 

max E| V ^ o(^], 

«£[0,1] \T‘! 

l=p+\ ' ' 

where zfu) = efu) {aw{ilT) - a„{u)) satisfies 

max \qi{u)\ = o( — | and max V zfu) = 0(b). 
o<r.Hern.n ^ 7 uero.n Z-l ' 


p+l<i<T,ue[0,\] 


i=p-¥-\ 


Then the result follows from Lemma 2 if we write the moment of order 2q as a multiple sum. 

2. The result follows from Proposition 3. 

3. Using Proposition 3, we have max„g[o,i] E||5„|p^ = O ((y^)- Moreover for the bias part 

T 


^ T ^ 

E(S„) -Ku= Yj u(«) (e (WiXXi) - ^«) + Y 1 

i=p+\ \i=p+l 


Ku. 
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The first term is 


bounded hy c{b + y ) where C > 0 does not depend on u, T and 


max 

uel 


T 

z=p+l 


is 0{^Y^ or 0(1) when I = \b,\ - b\ ox I = [0,1] \ 1 - b] respectively. 

4. Since Sf/ 7 - = ^ Z,Lp+i KWiXiX'-, note that if |m - 4| < 1, then ||S„ - Sf/rll < where C > 0 
does not depend on u, t and T (we recall that WjXiX'. is bounded). Then it is sufficient to show that 
maxp+i^f^p = Op(l). This can be proved as in Lemma 5, point 4. Details are omitted. The 

other assertions are a consequence of the previous point and of the bound ||/?„|| < CUSh - Ku\^.u 

Lemma 8. Let (r(M)^ be a family of positive definite matrices such that u i—> r(M) is Lipschitz. Set 
= Jq A'Ji'(u)Audu. Then 


where 


Moreover 


r V^(^r -E(^r)) N(OA\\K%Var^(e^c), 



tr 


((r(u)G(u))") 


du. 


G{u) = e(iTi(m)Vi(m)'*^i(m)^i(m)'). 


E(^7-) = 


Var(e^\\K\\l 

fb 


J" tr(r{u)G{u)'jdu 


+ o 



Proof of Lemma 8 We set for i,j € Up + 1, r], Qij = ei{u)ej{u)T{u)du. Moreover, let J/; = WicrJXi- 
Then 

T 

^T= Yj 

i,t=p+l 

We use the decomposition "Vp - 'XV\j + 'T'2,7’ with 

- 2 ] ZcZiX'iQiyXt, 

p+\<t<i<T 

and 

T 

^2,T = Y 

i=p+l 

Note that E(n/r) = E(^2 ,t) and max,/ 1 | 2 //|| = o(^)- 
• We first show that 


T^fb{'V2,T - E("V2,r)) - op(l)- 


(G.l) 


To show this, we decompose 


'Vzj - E('V2,7’) = 'T^2,i,r + E(Z^)'y2,i,r, 
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with 


T 

^2XT= Yj (z- 

i=p+\ 

T 

^1,2,T = Y 

i=p+\ 

Since (J/;) is bounded and 'V 2 ,ij is a sum of uncorrelated random variables, the second moment of 
*1^2,i,r is Then T yfb'V 2 ,]_j - op(l). Moreover, one can decompose 'y 2 , 2 ,r as a finite sum of 

terms of order Op j. This can be easily seen taking the second moment of these terms and then 

using Lemma 2. This leads to (G.l). 

• Next, we prove the assertion on E('T' 2 ,r)- We have 


E(^2,r) 


T 

Var(^2j ^ E(d/;eMd/,) 

i=p+\ 


Var 


Var 


(W 


K- 

K'- 



E (j/i {i/TY r(n)d/i (i/T)) du + 01^) 
duB.(}Yi (//r/f (//T)) + . 


Next, note that for a Lipschitz function / : [0,1] —> R, 


1 

Tb 




f{u)du\\K\\^ ^0\b + — 


This yields to 


E(^ 2 ,r)- 


Var 

Tb 



B(}fx{uYT{u)}fi{u))du\\K\\\ - O 




But using the fact that Tb^!'^ — > oo, the last quantity is 

• Finally, we study the convergence in distribution of 'V\j. The argument is to apply the central 
limit theorem for martingales differences. Firsf, we show fhe Lindberg condition. We sef !P, = 
Zr=p+i Then = Tj=p+i 'PiZi- It is enough to show that 

4 ^ 

(rV^) Y H'Ptzt) - o{\). (G.2) 

i=p+l 
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Using the independence between Z, and !P,, the fact that d/, is bounded and the Burhhdlder inequality, 
we have for a generic constant C > 0, 


r-l 


E(pfzf) < CE II 2] 

, ^=p+i 
i-l 


< c 


= o 


2 \\Qi,ef 

t=p+\ 


{T^b^y 


This leads to (G.2), using the condition Tb^ —> oo. Now, to obtain the convergence mentioned in 
Lemma 8, it remains to prove that 

T 

T^bE(zl) Y, 'Pi- (z^)^ = op(l). (G.3) 

i=p+l 

We will use the following decomposition 

T T 

Y,Pl=T. ^<+2Bi+B2, 

i=p+l i=p+l 


where 

i-l 

€,m=p-y\ 

T 

^ ei{u)ei{v)ei{u)em{v)Bi^(^m{u, v)dudv. 



with 

and 




B\ = ^ Ze}f'^Ee^m^mZm, 

p+\<i<m<T 

T 

B2= Yj 

e=p+\ 


T 

Ee,m= Yi QitE{^i^'i) Qi,m 
We also set pi^e,m(u,v) = ei{u)ei{v)e({u)em{v) = 


51 











We first show that T'^bYj=p+i ~ have 

i=p+l 
1 r*\ T' 


< 


L L Z Z Z Pi,£,m{u, V)pi'/'^m'{u, v)|E v)Bi> ^ni'{u, v)) \dudv 

Jo Jo ij>=p+ip+\<f^m<ip+\<{',m'<i' 


<- ... Z Z PUi,mi.U, V)pi'/'^m'{u, V)|E v)) \dudv, 

J\u-v\<2b i i meMu i',t’,m’eMu 

where At„ = € |[/7 + 1, r| : |^ - m| < 3t?|. Then the last quantity is 





Indeed, Pi/^m{u, v) ean be bounded by (up to a eonstant) and we ean apply Lemma 2 with 
ZT,i = Sinee ^ ^ ^ 0, we get the result. 

Next, we show that T^bBi = op(l). Using the faet that is bounded and the Burkholder 
inequality, we get 


T m-\ 

E| 2 ] ^ ^ \\EtJ\^ <C \\Et,mt. 

p+l<t<m<T m=p+l t=p+l \t-m\<4Tb 

Sinee maxr^m \\Ei,m\\ - O (^), we find that 

E| ^ Ze}f'^Ee^m^mZj\ - O 

p+\<i<m<T 

This proves T^bB\ = op(l). 

Finally, we study the eonvergenee of T^bB 2 . First, observing that maxp+i <^<7 ||£'f/|| = O ) 
and using Lemma 2, we get 

T 

{zji/'iEe^ii/e - E(z2d/;£f,^d/^)) = Op 

e=p+i 

Then it remains to show that 

T 

T^b Y E (^'tEu:^'c) - \\K*tC = op(l)- (G.4) 

£=p+\ 

We first note that 

T 

Y E(d/'£^yd/, 

£=p+l p+l<£<i<T 


)= Y t*" (® Qid) ■ 
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Next, one can verify (we skip the details) that one can replace with and Qii 

with ei(u)ef(u)dur(^ y) without changing the limit in (G.4). Then it remains to study the limit 


of 


where 


and 


Note first that 


1 

T^b ^ feST,e, 

£=p-\-i 

ft= tr |[E(d/,d/;)f(^)]j, 

(=r+i 


ei{u)e({u)du \ = 0\ 


T I ^ 


p+\<t<Tb {=T{\-^b) 

Moreover, ifrf7<f’<r(l - 3b), we have 

e+2Tb 


1 / * ~ 

- 2^ f^^\Tb } ’ 

i=£+l ' 


where H(x) = J_j W{v)W{x + v)dv. Since 


and 


we get (G.4).n 


Til-'ib) 


j 2 Vi ^1 . 2\ 

tr [E(d/i(n)d/i(n)')r(n)] 

e=Tb 


H Proof of Theorem 5 

The proof of Theorem 5 uses two lemma. Lemma 7 and Lemma 8, which are stated in the next section. We 
set^t, = Ak~^ and d/,- = WicrjXi. Under the null assumption (fi is non time-varying), we use the following 
decomposition of the difference P - p. 


p{u) P — KuA.{u) -|- KuBu + Zu (^M S u) ■ \Bu "I" ^m] ^ 

where 

T T 

K= Y, B,= Y ei{u)WiXiX] {a{i/T) - a{u )), 

i=p+l i=p+\ 

T 

Ru =l^u{Ku - Su)k-2 {Ku - Su)Sl\ Zu =^U Y - Ku\ ' [a{i/T) - a{u)]. 


(H.l) 


i=p+l 
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Note that maXj,g[oj] ||z„|| = o(^y + = o(^b^y The first part of Theorem 5 will follow from Lemma 8, if 

we show that 


J" (/?(m)-/3) r(u)(jB(u)-jSjdu-J' A^/r(M)T(M)/c„A„r/M = op|-^ 

We set H(u) - l<uT{u^u- To show (H.2), we use the previous decomposition and proceed as follows. 
1. We first show that 


(H.2) 


Jo 


H{u)Kudu = op(l). 


We have for a suitable constant C > 0, 


ME [ 0 , 1 ] 

Then using Lemma 7, we get 

and (H.3) easily follows. 

2. Next we show that 


max < C max Je(||B„|| 2) • E(||A„|p). 


max e|b[,//(m)A„| = 0\ — 


ME [ 0 , 1 ] 


Jo 


H{u)Budu = op(l). 


Using Lemma 7, we have max„g[o,i] E||B„|p = O and (H.4) easily follows. 
3. Next, we prove that 

1 


fz;, 

Jo 


r(u)A:uAudu = Op 


We have 


T^/b)' 

E| J' z'^r(u)JuAudu\ = O I max ||z„|| • ^E(||A„|p)j . 


Then using Lemma 7, we get 


e| J" zJ'{u)KuAudu\ = O 




This leads to (H.5), using our bandwidth conditions. 

4. Under our bandwidth conditions, we have T yfb z'J'{u)zudu = O (Tb^-^^ = o(l). 

5. Next we show that 


fK‘ 

Jo 


H{u) {ku - Su)kJ (Bu + A„) du = op 


T^/b 


(H.3) 


(H.4) 


(H.5) 


(H.6) 
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If / is a subinterval of [0,1], we have for a positive constant C, 


e| J'KH{u) (Ku -Su)k^^ [Bu + Au)du\ 

< C • |/| • maxEi/3 (||a„||3) . . (e'/^ (||b„||3) + e^^^ (||A„||3)) . 

When 7 = [0, U [1 - ft, 1], the last bound is O using Lemma 7. Then, 

J^A^77(m) {ku -Su)k~^ (b„ + Ah) du = op 
When 7 = [ft, 1 - ft]. Lemma 7 leads to 

Ja' 77(n) {K, -Su) K-J (Bh + A„) Bn - O • |ft + ^ 

Using our bandwidth conditions, this yields to 

J' A'^H(u) {ku -Su) (B„ + A„) du = Op j • 

Moreover, if 7 = [0, ft] U [1 - ft, 1] or 7 = [ft, 1 - ft], we have 

e| J^KH(u) {ku - Su)K-jE{Bu)du\ 

< Cb\l\ max VeIISh - Ku\\^ max Veiiajp 

uel MS [0,1] 

< C-^|7|max Ve|| 5„ -/r„||2. 

Vr uel 

Using Lemma 7 and our bandwidth conditions, we also obtain A[,77 (m) (/(•„ - S u) /^u^E(Bu)du - 
op (^) and then (H.6). 


6. Next, setting Mu ^ Ku (Ku -Su)Ku^ (Bu + Au), we show that 
rVft r Mur{u)Mudu = op{\). 

Jo 


(H.7) 


Using Lemma 7, we have 

E||M„||2 < c ^E(|k„-5„||4).E(||B„ + A„||4) < C ^EOkn - 5„||4) [ft^ + i 


Then, studying e| MuT{u)Mudu\ when 7 = [0, ft] U [1 - ft, 1] or 7 = [ft, 1 - ft], (H.7) follows using 
the previous bound. Lemma 7 and our bandwidth conditions. 
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7. Next we prove that 


rV^ 


['k 

Jo 


r(u)Ru (Su + A„) du = op(l). 


(H.8) 


When I = [b,\ - b] ov I = [0,b]V) - b,\^, we use the bound 


2 

I I A'j:{u)Ru {Bu + Au) du\ 
Jo 


\\Ru\\ du. 


Moreover, using Lemma 7, we have 

E r||AJ|2 (||B„|P + ||A„||2) du < 1\1\ max Ve||A„||4 • Ve||B„|| 4 + E||A„||4 = |/| x O (^ + ,. 

Jl ' ' i<e[0,l] \1 {TbYj 

Now, if 7 = [0, ft] U [1 - ft, 1], we have \\Ru\\^du = Op(ft) and we obtain from the previous bounds 

I A'^r(u)Ru (Bu + Au) du^ ^ Op = Op j . 

Now, if 7 = [ft, 1 - ft], we have, from Lemma 7, ||7?„|p(iM = Op (ft^ + and then 

I £ A'^r(u)Ru (Bu + A„) du\^ = Op 11^ + ^ 


(rft)2 


• Ift^ + 


(Tb)^ 


This is clearly op under our bandwidth conditions. Then (H.8) follows. 
8. Finally, setting = 7?„ (Bu + A„), we show that 


T 

We have 

Moreover, 


Vft C M'u 
Jo 


T(u)Mudu = op(l). 


(H.9) 




Y(u)Mudu < max ||5'„'||^ • f ||B„ + A„||^ • ||7 ?„||Vm. 
«6[0,1] Jj 


I 


\\Bu + Au\^-\\Ru\\^du 


< |7| • max ^EM • ( + Ve||A„||4) 


' max 

uel 


VeM- 




Considering the two cases for 7, (H.9) follows from Lemma 7 and the bandwidth conditions. 
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The first part of the theorem is now complete. Now we prove that the asymptotic of our statistic remains the 
same if we replace p with the estimate p of Theorem 1. Since p - p = Op we have 

ST{a,p) - ST(a,P) - -2It + Op 



where Ij = £ (p - p) r(n) (p{u) - p) 


du and it remains to prove 




Iy — Op 



Using decomposition (H. 1), we have already shown that 



-p- KuAu) Tiu) (piu) -p- KuAu) du 



Using Cauchy-Schwarz inequality, it is easily seen that 



{p - p) Tiu) [piu) -p- du 



(H.IO) 


Then to show (H.IO), it remains to show £ r(u)A:(u)A(u)du - op We have 


f 


T{u)K{u)A{u)du 

0 

^ J Tiu)Kiu)eiiu)du ■ JJiZi 


i=p+\ 


- Of 


Vf 


The last equality follows after noticing that maxp+i^/^y || £ r(u)K(u)e,(u)dull = 0^^^ and applying Lemma 
2 componentwise. Then we get (H. 10). The proof of Theorem 5 is now complete.□ 


I Proof of Proposition 3 


Setting Ht = Xf - dt and Hf - Xp - BXf for p + 1 < f < T, we have 

' J '^-1 T- 


p = (du...,Up)' - 2] Hir; ^ H,H, 


V'=P+1 / 


t=p+l 


/ a . \ ^ 

where 'Ht = , Ht-p\ . We use the decomposition 


T T 

H, = H, + EX, - 2] k,jEXf - Y, kiHi. 

i=p+\ i=p+\ 
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To prove the result, we will show that 


1 


T T 

2] 2] kr-,jHi - oP(l), 

t=p+l i=p+\ 


S < t 


and 


1 

Vf 


T T 


Z Z 

(=p+l I=p+1 


T 

^ kt-sjHi = op(l), 
i=p+l 


S <t. 


(I.l) 


( 1 . 2 ) 


Using (I.l) and (1.2) and some arguments given in the proof of Theorem 1, one can show that Vt^S has 
the same asymptotic distribution than the same estimator but with H replaced by H. Then one can deduce 
that VTjS converges to a p-dimensional Gaussian vector with distribution Np{0,(r^lp^ and the result of 
Proposition 3 easily follows. Let us prove (I.l) and (1.2). 

1. For (1.1), we decompose the sum into two terms, 

T T T 

t=p+l i=p+\ t=p+\ ii=t 

The expectation of the (positive) first term is smaller to Clb where C is a positive constant. Under the 
assumptions of Proposition 3, we have Tb^ 0. Then the latter expectation is o ^ Vt^. The variance 
of the second term is 


2] 

= Yj (til) E (^?) + Z ® (^?) ■ 


It is easy to show that this variance is of order o(b = o{T). This shows (I.l). 

2. Next we show (1.2). The H'.s are independent. One can use the results given in Zhang and Wu (2012), 
Lemma A1 and A3, from which we deduce that 


max 

p+l<t<T 


Z 

i=p+l 


Op 7 2(1+4) + 


Vrf^iog(r)) 


Tb 


Using our bandwidth conditions, this entails (1.2). 
The proof of Proposition 3 is now complete.□ 


J Asymptotic semiparametric efficiency 

J.l Proof of Proposition 2 

We set gt = g(tlT) and e, = Then using the inequalities 

1 

0 < 1- x + < x^, log(l + x) - X + — < x^, X > 0, 

1 + X 2 
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we have 


^/P, 


log 




(V„....xr) = 

^r=P+l 


1 + ^ 
Vf 


,og|l.^ 


A7-^,,,-i 2 (^' -l)*' 


f=p+l 


+ ?T, 


with Irrl < H^p+i (^? + l) Since is bounded, we have rj = oPr,„^(l)- Next, observe that et is 

Lipschitz in . .,Xf_p. Setting Yt = Ztdj + we have ^ = f ZjLp+i and since 

{Yt)t is a processes of type II, we get from Lemma 4, 

E|f x w = oL 


t=p+i 

This entails j YJt=p+\ - opj.^^(l). Finally, noticing that Ct is a bounded and Lipschitz function in 


..., X^_^, we deduce from Lemma 1, 


T T 

f Z Z 


t=p+l 


2T ^ 

t=p+i 


where et{u) - 


M,(uyg(u)+N,(u)’h 


(r,(uY 


. From Lemma \,u ^ E(^ei{u)^^ is continuous. This leads to 




du = ||(g,/z)||H, 


where the limit is in PT’ Q.yj-probability. This achieves the proof of Proposition 2.n 


J.2 Proof of Corollary 1 

To prove the first assertion, it is enough to check the equality 
< k*v, (g, h) >H= h'v, 

for all (g, h,v) € H X R”. Using the notations E(u) = (with 

lM,{u)Mi{uy\ lMi{u)N,{uy\ ^/iVi(n)iVi(n)' 

we have q^yu) = Ei(u)~^E 2 (u) and it is easy to verify the equality 


S - 


r [Ej{u) - E 2 {uyEi{u)E 2 {u)\ du. 

Jo 


(LI) 


Then, it is easy to get (J.l) using the expression of the scalar product on ff. For the second assertion, we 
apply Theorem 3.11.2 of van der Vaart and Wellner (1996), using the equality ||k*v||H - 2v'E“^v. □ 
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K Numerical experiments for inference/testing 

K.l Example of semiparametric estimation. 

We first illustrate the methods of parameters inference in the semiparametric model with constant lag co¬ 
efficients. We consider the noise distributions ~ Af(0,1), t(9) (Student distribution with 9 degrees of 
freedom) and t{5). These three distributions satisfy the moment assumption < oo used in Theorem 

1, Theorem 3 and Theorem 4. The number of lags is fixed to p = 2 and the intercept function is defined 
by ao{u) = 2 + sin(27ra). We compare the estimates obtained using the procedure described in the paper 
and the plug-in estimates which are asymptotic optimal. Two sample sizes are considered: T - 500 and 
T - 1500. Only one bandwidth is used and selected by the CV procedure (the same bandwidth is used for 
estimating the intercept function and plug-in estimates are also computed using this initial bandwidth). Note 
that the t-distributions do not satisfy the moment assumption for the asymptotic normality of the plug-in 
estimator of lag coefficients (in Theorem 2, we assumed that f has moments of any order but our assumption 
is probably not optimal). The plug-in estimator seems to have a smaller RMSE (see Table 4), even when 
T - 500. Observe also that our estimates are less accurate when the noise has fatter tails. The RMSE for do 

is defined by E(do(f/T) - ao{tlT))^. 


Table 4: RMSE for parameter estimation (notation * is for the plug-in estimator) 



^0 ~ N{0, 1) 

^0 ~ t{9) 

^0 ~ t{5) 


do 

d\ 

d2 

do 

di 

02 

do 

d\ 

^2 

T - 500 

0.5446 

0.0859 

0.0769 

0.6380 

0.1104 

0.1000 

0.7501 

0.1732 

0.1430 

^0,=f= 


0,2,* 

^0,=f= 

Ol^^, 

02,* 


0\^if 

02,* 



0.5068 

0.0750 

0.0651 

0.5606 

0.0949 

0.0822 

0.6489 

0.1619 

0.1167 


do 

di 

02 

do 

di 

02 

do 

d\ 

02 

T - 1500 

0.3335 

0.0473 

0.0440 

0.3844 

0.0615 

0.0557 

0.5181 

0.1012 

0.0963 

^0,=f= 


02,* 

^0,=f= 

0\^^, 

02,* 


0\^if 

02,* 



0.3192 

0.0433 

0.0385 

0.3571 

0.0536 

0.0471 

0.4365 

0.0775 

0.0727 
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3.5 



Figure 5: Estimation of aQ{u) = 2 + sin{lnu) when ~ Af(0,1), a\ = 0.3, a2 = 0.2 and T = 1500 (the red 
curve is the initial estimate and the yellow curve is the plug-in estimate) 

K.2 Testing the constancy of coefficients in a tv(l) process. 

We consider here a tv-ARCH model with p - I and ~ Af(0,1) or ~ t{9). We consider two setups. 
In Setup 1, we have ao{u) = 2 + sin{2nu) and a\{u) = 0.5. In Setup 2, we have ao{u) = 1 and a\(u) = 
0.5 -I- 0.25 X cos (2nu). Considering two levels a = 10% and a = 5%, we approximate the probability of 
rejecting Hq: ao constant or Hq: ai constant. Results are reported in Table 5. Under Ho, this probability has 
to be close to the level a of the test. One can observe that using a t(9)- distribution for the noise does not 
create size distortion. However, under the alternative Hi , the t distribution entails a smaller power than for 
the standard Gaussian. This suggests that the power of our tests is impacted by a fat tail noise, which is not 
surprising. Reasonable powers are obtained when T = 2000, the order of the sample size used in our real 
data applications. 


Table 5: Approximation of the power for testing parameter constancy in tv(l) processes 



Setu 

P 1 

Setup 2 

T = 1000 

T - 2000 

T = 1000 

T - 2000 

ao 

ai 

ao 

ai 

ao 

a\ 

ao 

ai 

^0 ~ N{0, 1) 

a = 5% 

0.99 

0.07 

1 

0.07 

0.08 

0.54 

0.07 

0.91 

a = 10% 

0.99 

0.12 

1 

0.13 

0.13 

0.68 

0.12 

0.96 

^0 ~ t{9) 

a - 5% 

0.97 

0.07 

1 

0.06 

0.06 

0.34 

0.06 

0.69 

a = 10% 

0.98 

0.13 

1 

0.12 

0.13 

0.47 

0.11 

0.8 


A comparison with the Gaussian quantiles. For T = 500 and p = 1, we consider the setup 1. When 
a = 10% and b = 0.01 x 1 < ^ < 30, we compare the coverage probabilities obtained using the Monte 
Carlo method with the coverage probabilities using the Gaussian quantiles when a = 10% and for testing 
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the constancy of the first lag coefficient. In Figure 6, one can see that the Monte Carlo method is interesting 
because the coverage probabilities seem more precise and less sensitive to the bandwidth parameter if we 
exclude very small bandwidths. 



Figure 6: Coverage probabilities for Gaussian inputs (left), t{9) inputs (middle) and the difference of fwo 
independenf random variables following an exponenfial disfribufion wifh paramefer 1 (righf). Dashed fines 
represenf fhe coverage probabilities obfained wifh fhe Gaussian quanfiles. 

K.3 Power curves for testing non time-varying coefficients in a tv(2) process 

In Ibis subsection, we simulafe approximafion of fhe power for fesfing Hq : qq consfanf (resp. ai consfanf, 
<32 consfanf, {a\,a2) consfanf) when aQ{u) - 2(1 + Q&milnu)), ai{u) - 0.2 + | sin(27ra), a 2 {u) = 0.2 + 
I cosilnu) wifh 0 < 6 < 0.45. The noise disfribufion will be eifher Gaussian or a sfudenf disfribufion wifh 
9 degrees of freedom (we remind fhaf Theorem 5 is only valid when < oo). Figure 7 represenfs an 

approximafion of fhe power curves when T = 2500 and a - 10%. One can observe fhaf a more faf fail for 
fhe noise leads fo a slighfly smaller power for our fesfs. 
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Figure 7: Power curves when the noise is Gaussian (on the left) or follows a f(9)-distribution (on the right). 
The legend for the curves is: - for aq constant, — for {ai,a 2 ) constant, + for rzi constant and * for a 2 
constant. 

K.4 Testing the second order dynamic in the semiparametric model 

Here we assume that /? = (ai,..., ap^ . The null hypothesis is yS = 0. We use the procedure described in 
the paper after choosing the bandwidth parameter b by cross-validation. We restrict our study to the case 
p - 2. For the simulation setup, we consider two scenarios. In setup 1, we consider a constant intercept 
ao(M) - 10“^. In setup 2, qq is a piecewise affine function such that ao(0) = ao(0.5) = flo(l) - 10~"* and 
ao(0.25) = ao(0.75) = 4 • 10“^. The noise distribution will be Gaussian, t(9), or t{5). We also consider 
two sample sizes: T = 500 and T = 1000. Table 6 and Table 7 provide approximations of the coverage 
probabilities. In Figure 8, approximations of some power curves are given under the alternative a\ - a\ - 
B X 0.02, with B - 0,... ,6. The results seem satisfying for the three noise distributions. 


Table 6: Approximation of the coverage probabilities when T = 500 


Setup 1 

a - 10% 

a - 5% 

^0 ~ MO, 1) 

0.92 

0.95 

^0 ~ t{9) 

0.92 

0.95 

^0 ~ 1(5) 

0.91 

0.94 


Setup 2 

a = 10% 

a = 5% 

^0 ~ MO, 1) 

0.93 

0.97 

^0 ~ 1(9) 

0.92 

0.95 

^0 ~ 1(5) 

0.91 

0.94 


Table 7: Approximation of the coverage probabilities when T = 1000 


Setup 1 

a - 10% 

or - 5% 

^0 ~ MO, 1) 

0.90 

0.96 

^0 ~ 1(9) 

0.90 

0.94 

^0 ~ 1(5) 

0.90 

0.93 


Setup 2 

a = 10% 

a = 5% 

^0 ~ MO, 1) 

0.88 

0.94 

^0 ~ 1(9) 

0.87 

0.93 

^0 ~ 1(5) 

0.90 

0.93 
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Figure 8: Power curves for testing the second order dynamic. Setup 1 with T = 500 (top left), setup 2 with 
T - 500 (top right), setup 1 with T = 1000 (bottom left), setup 2 with T = 1000 (bottom right). 

K.5 Information criterion for the number of lags in tv-ARCH processes 

In this subsection, we study numerically the performance of the information criterion used for selecting 
the number of lags in time-varying ARCH processes. We first consider the case p = I, with aQ{u) - 
2(1-1- 0.4sin(27™)) and a\{u) = 0.3. Two distributions are considered for the noise: the standard Gaussian 
and the t{9) distribution. In Table 8, we simulate B - 2000 models for both noise distributions and three 
sample sizes: T = 500, T - 1000 and T - 2000. The results are correct for large sample sizes. As for the 
estimation, one observe that the performance of the criterion is sensitive to the tail of the noise distribution. 
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Table 8: Percentages of models correctly fitted (CF), underfitted (UF) and overfitted (OF) for Setup 1 



CF 

UF 

OF 


T = 500 

88 

6 

6 

^0 ~ ^(0,1) 

T - 1000 

97 

2 

1 


T = 2000 

99 

0 

1 


r-500 

80 

14 

6 

^0 ~ K9) 

T = 1000 

90 

8 

2 


T = 2000 

94 

5 

1 


In a second simulation setup, we consider the case p = 0,1,2, using the same intercept qq and setting 
a\{u) = 0.2 + 0.2 • sin(27ra), a 2 {u) = 0.2 + 0.2 • cos (Inu). In this case, the lag coefficients can be arbitrary 
close to zero and the true model more difficult to select. Numerical experiments are reported in Table 9. 
When ^0 ~ f(9), large sample sizes are necessary to obtain good results. Once again, one can explain this 
behavior by the difficulty of getting accurate estimates with such noise distribution tail when the sample size 
is not large enough. 


Table 9: Percentages of correctly fitted, underfitted and overfitted models for Setup 2 



p-0 

p = \ 

p = 2 

CF 

UF 

OF 

CF 

UF 

OF 

CF 

UF 

OF 

^0 ~ MO, 1) 

r -500 

93 

0 

7 

78 

16 

6 

74 

22 

4 

T = 1000 

93 

0 

7 

92 

6 

2 

91 

7 

2 

T = 2000 

96 

0 

4 

99 

0 

1 

99 

1 

0 

^0 ~ t{9) 

r -500 

91 

0 

9 

66 

28 

6 

58 

37 

5 

T - 1000 

95 

0 

5 

83 

14 

3 

78 

20 

2 

T - 2000 

96 

0 

4 

94 

5 

1 

95 

4 

1 
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